Coauthored with Raka Banerjee and Talip Kilic
So if you missed it, Part I of this two-part blog post outlines all of the main reasons that you should consider incorporating Computer Assisted Personal Interviewing (CAPI) into your survey efforts. We’ll now try to even things out, by going over the many pitfalls to watch out for when switching to CAPI.
First of all, simple mistakes can be very costly. Imagine if you run into a problem with your paper questionnaire mid-way through the fieldwork process. You get the word out to your supervisors, they tell your enumerators, and then your enumerators fix the errors in each copy of their questionnaire – it’s a bit of a pain, but it’s not too difficult. With CAPI, however, you will have to have your electronic questionnaire reprogrammed (meaning a programmer needs to be kept on-duty for just such an occasion), and then somehow get the revised versions of the questionnaire to all your teams, even those teams in far-flung rural areas with little or no internet access. Meanwhile, the entire fieldwork process may have to be shut down while the teams wait for the revised application, whereas with a paper questionnaire the enumerators could have instantly made the fix and continued with their work.
At the end of the day, with CAPI, you will only be as good as your programming. The paper we mentioned yesterday by Caeyers and co. has an interesting unintended experiment where the programmers forgot 13 validation checks when programming the CAPI. Their findings: no significant difference between CAPI and pen and paper for these questions.
Because of the dire consequences of mistakes, the single most important issue to take into account is that CAPI involves high start-up costs – and by high, we mean very, very high. Programming errors are bound to happen, but for the reasons mentioned above, it’s extremely important to minimize these types of programming errors at all costs. First of all, this means high programming costs – much higher than programming the data entry application for a paper questionnaire. This also means higher piloting costs, as you’re more likely to have bugs and problems in the piloting stage, which will need to be followed up with more programming. Finally, this means higher training costs. With a paper questionnaire, your enumerators get to see all the questions and understand how they are linked as they conduct the interview. However, with CAPI, your enumerators will need to develop a more intuitive sense of the questionnaire, as all they’ll be physically holding in their hands is a tablet. Electronic questionnaires can be designed to minimize the potential confusion to the enumerator, but this still points to the downside of the multidimensionality allowed for by the CAPI setting. You may also lose some experienced enumerators because of the technology barrier, although this may not be as much of a concern as one might expect. Indeed, an interesting paper by Marcus Bohme and Tobias Stohr suggests that older and less computer-literate folks will take longer to train in CAPI – but makes the point that you need to be mindful that they may have other advantages in the field (e.g. when age proximity matters for interview quality).
In the previous post, we mentioned that you’ll be getting data faster. That’s true, but in some cases, you might also end up with no data. As a device-based technology, CAPI is susceptible to the same sorts of problems as any such technology: if an enumerator’s hard drive fails before s/he has a chance to upload data to the server, it’s not like there is a paper version lying around – that data is simply gone. So you need to have solid backup systems in place. (On the other hand, while we’ve seen goats eat paper questionnaires, they probably don’t like the taste of silicon).
This leads us to another important issue, which is security. With CAPI, your enumerators will be transferring and backing up confidential data containing sensitive information about identifiable respondents across the internet. If you decide to use a free file sharing system for this, it raises serious questions about the security of the confidentiality that you promised the respondents at the beginning of the interview (for example, one popular service recently suffered a security breach with some of the data stored in their cloud). There are solutions for this – Virtual Private Networks (VPNs), secure File Transfer Protocol (FTP), etc. – but this needs to be taken into account at the outset of any CAPI survey effort.
You need to buy computers and/or tablets and/or mobile phones for each enumerator! This has all sorts of implications – most obviously, buying a device for each enumerator costs much more money than printing a bunch of surveys on paper. Depending on your field staff needs for your survey, you may simply have too many enumerators to be able to afford the cost of buying a machine for each one. You are then also dependent upon that computer/tablet/mobile phone to work properly. If an enumerator suffers machine failure, that enumerator is then out of commission until their device can be fixed or replaced. Paper, as an inherently less buggy technology, is far less likely to go wrong. Finally, you’re dependent on the thing that your machine is dependent on – electricity. It’s no secret that many of the places that most need improved data also lack a regular, reliable supply of electricity, and that has consequences for the progress of your fieldwork. This might mean that your teams have to schlep generators around. This might also mean that your survey managers are going to have a hard time ensuring that any updates/revisions to the CAPI application are received in a timely fashion (and without implications for data quality).
Hardware is not the only machine-related concern: software is also an issue. The CAPI software package that you choose for programming your questionnaire can drastically change the parameters of your survey effort. Software packages vary significantly in their strengths and weaknesses, and you have to make the best choice for a given survey based on a number of issues, including sample size, questionnaire length and complexity, the need for visual aids or other tools during the interview process, and many other considerations. To help you with the choice, the Living Standards Measurement Study – Integrated Surveys on Agriculture (LSMS-ISA) project, in collaboration with the IRIS center at the University of Maryland, recently published a comparative assessment of the existing software programs for the development of CAPI applications, which is freely available at www.worldbank.org/lsms-isa. The World Bank’s LSMS-ISA project and the Development Research Group’s Computational Tools team are also currently in the process of developing a free CAPI software package that is expected to be released to the public within the next year or so (and we’ll blog about that, so you won’t miss it).
Lastly, there is still limited empirical evidence on the improved data quality of a field operation implemented using CAPI, as opposed to a well-supervised pen-and-paper operation with field-based data entry featuring similar consistency checks (and with comparable tools to improve the quantification of non-standard unit-item combinations). We’ve discussed two such papers here; one other paper worth looking at is by Fafchamps and coauthors, which is a bit more agnostic on the quality gains (at least for firm profits and sales).
The central take-home message regarding CAPI at this point concerns the importance of getting it right. There is no computerized substitute for a well-designed, well-supervised field work effort. The key here is that the basic principles of data quality control – accuracy checks, data cleaning, etc. – are no different when integrated into the development of a CAPI application than when they are implemented into surveys that feature pen-and-paper interviewing with field-based data entry. And CAPI tools are only useful as far as enumerators and field supervisors take advantage of the available facilities and act on inconsistencies accordingly. CAPI can greatly improve the speed and accuracy of data capture, but to do so, it must be implemented correctly. This means far greater up-front investments in the survey effort prior to fieldwork in order to ensure that the CAPI application is well-designed, bug-free, and correctly incorporates all necessary data quality checks. Ultimately, the decision to use CAPI has to take into account its constraints and disadvantages as well as its potential rewards. The first and foremost goal is, as always, better data.
Congratulations on this very useful mini-series, which addresses most benefits and pitfalls one can encounter when using CAPI in the field: indeed CAPI research (and mobile research, for that matter) suffers from a lot of myths surrounding them (these tools are supposed to be "much cheaper", "much quicker", etc), which are usually debunked fairly quickly when in the field. In this regard it is very useful you show both pros and cons.
In addition, I would also like to highlight these following issues:
1. security - bringing high tech tools like tablet computers, PDAs or smart phones into the field also changes the interviewer's own security situation as s/he will be more exposed to the danger of being robbed / theft - after all these are expensive and sought after items
2. devices - not all devices are actually suited to be carried into the field because they are prone to damages (from cold/heat, water, wind, sand, accidents such as dropping) or other issues like short battery times (which requires recharging) or reflecting screens (which make open air interviews in a lot of cases impossible). In this respect choosing the hardware is another important issue.
3. "blinded by technology" - bringing a high tech device into the field is also prone to bias the respondent because the (luxury) device itself absorbs a lot of attention. Experience from the field has shown that this sometimes is prone to open a gap between the respondent and the interviewer, so that the overall atmosphere of the interview can suffer.
One thing that I am observing is the absence of a controlled and robust (method) test of different data collection procedures and methods to truly understand strengths and weaknesses of the different data collection tools, which are ultimately also different methods (CAPI is not simply PAPI with computers).
This should go beyond checking for "improved data quality", which certainly is an important aspect, but should also consider methodical effects. For example, we have only used mobile data collection tools when we were able to prove in such a test that the benefits outweigh the limitations when comparing them to other methods (and do not bias the results in any way).
In the end of the day the choice of methods (and of the data collection tool) should depend on the study's goals - not the other way around. Using technology in a research for its own sake usually is bound to fail.
If i use a free file sharing system for this, how it it raises serious questions about the security of the confidentiality?
The main issue is confidentiality and the security to protect that. This will be an issue for both free and paid systems, but obviously with free systems, you have one less mechanism to hold them to account and by extension, they have slightly lower incentives to protect your data. Whatever you do, make sure it is secure.
Raka and Markus