Syndicate content

Data quality in research: what if we’re watering the garden while the house is on fire?

Michael M. Lokshin's picture

A colleague stopped me by the elevators while I was leaving the office.

“Do you know of any paper on (some complicated adjustment) of standard errors?”

I tried to remember, but nothing came to mind – “No, why do you need it?”

“A reviewer is asking for a correction.”

I mechanically took off my glasses and started to rub my eyes – “But it will make no difference. And even if it does, wouldn’t it be trivial compared to the other errors in your data?”

“Yes, I know. But I can’t control those other errors, so I’m doing my best I can, where I can.”

This happens again and again — how many times have I been in his shoes? In my previous life as an applied micro-economist, I was happily delegating control of data quality to “survey professionals” — national statistical offices or international organizations involved in data collection, without much interest in looking at the nitty-gritty details of how those data were collected. It was only after I got directly involved in survey work that I realized the extent to which data quality is affected by myriad extrinsic factors, from the technical (survey standards, protocols, methodology) to the practical (a surprise rainstorm, buggy software, broken equipment) to the contextual (the credentials and incentives of the interviewers, proper training and piloting), and a universe of other factors which are obvious to data producers but usually obscure and typically hidden from data users.

Many data problems are difficult to detect

Figure 1: The share of households responding positively to three categories of questions declined with the progression of the survey. For example, the share of households that experienced any accidents over the last year dropped by half by the fifth month of the survey being in the field.

An analysis of a recent survey generated some interesting findings: the proportion of households reporting chronic diseases declined by more than 30 percent over the course of the fieldwork (Figure 1). With few exceptions, the characteristics of households interviewed in the beginning of a survey should not differ from those interviewed towards the conclusion of the survey period. Were I a skeptical person, I might wonder whether the interviewers gradually came to understand that positive answers to specific questions generated a lot more work to do (a range of follow-up questions about history of illness, treatments, medical expenses, and more). Accordingly, these interviewers might have learned that nudging respondents to give a different answer might minimize their workload (“Come on, my blood pressure is twice as high as yours and I’m fine!”) or else simply mis-record respondents’ replies (“no” instead of “yes”) to free up the rest of the day. This may be a caricature, but errors like these are difficult to catch even if you check data in real time and apply sophisticated validation algorithms.

There are many similar situations when the quality of data is seriously altered by substandard interviewer training, insufficient supervision, and interviewers shirking their responsibilities. For obvious reasons, many (if not most) data producers are typically unwilling to reveal information about problems in the field to the agencies which fund surveys and to the people working with the data.

Poor data can be amplified into bad policy

This problem is well recognized by the leading statistical institutions in developed countries who use sophisticated econometric techniques to better understand how data collection progresses through the enumeration cycle, to identify strategic opportunities, to evaluate new collection initiatives, and to improve how they conduct and manage their surveys (i.e., Groves and Heeringa, 2006).

Unfortunately, developing country statistical offices lack the resources to establish such practices. As such, survey data quality can become suspect. Imagine being the researcher analyzing a relationship between the presence of chronic disease and poverty. The economic model is complex, with a highly non-linear econometric specification that relies on an instrumental variable approach to address the issue of reverse causation. Even small errors in the data could lead to large divergences in the estimation results. Unless a researcher is directly involved in the fieldwork, they might never realize the magnitude of the problem with the data. They might write a paper based on this incorrect data, which might then generate a report with policy recommendations, which may next justify a large investment in that country’s health care system to implement the reform — a cascade of causation on the basis of faulty data.

It is hard to say how damaging this situation is for a particular program. The effectiveness of economic policies is a result of complex interactions of many factors, of which empirical justification might not be the most important one. A positive result bias of economic publications, common sense, political interests, and bureaucracy will likely dampen the negative effects of incorrect conclusions. However, the impact of systematic errors in multiple surveys could be extremely serious and lead to the adoption of concepts which might be otherwise difficult to refute.

Data quality is unglamorous, but economists need to take it seriously

Recent efforts to improve the replicability of economic research (i.e, Maniadis and Tufano, 2017) focus on empirical methodology and algorithms, which miss the potential errors coming from data. Trying to replicate results of economic analysis using new data could be expensive and problematic because of intertemporal consistency problems (i.e., Siminski at al, 2003). Papers could still be published in good journals based on erroneous data. Journal referees usually have little means to validate the quality of micro-data coming from developing countries.

Survey work is, unfortunately, not always considered a “brainy” activity. All too often, it is delegated to less experienced staff. For example, among almost 400 World Bank staff registered as users of the Survey Solutions data collection platform, 87% are the institution’s most junior staff or consultants. We see a similar demographic among the attendees of the seminars and workshops that are focused on survey design and logistics of the fieldwork.

If quality data is indeed important for policymaking, the status quo (and the attitudes which inform it) must change. The economics profession must acknowledge and own the responsibility to provide informed advice to practitioners in developing countries and propose better mechanisms for data quality validation and replication of results. Nothing less than a serious appraisal of the reality of these hidden data quality issues—and clear actions to countermand them—is needed to end the potentially pervasive problem of bad data and to mitigate any resultant consequences.

The good news is that data quality can absolutely be improved through the right combination of resources and human capital, including mainstreaming proper survey supervision, randomly repeated interviews, the use of advanced technologies to monitor interviews in the field, and more broadly through efforts to strengthen statistical capacity more holistically in developing countries (Statistics Canada 2008). Understanding the problem is the first, key step.

Comments

Submitted by Alicia English on

I would also point out that after a decade of doing survey work ( including development of the instrument, CAPI development for survey solutions, etc.), analysis and research, that the amount of funding that goes to data collection is paltry.

And there's been a couple of projects where, we've gone out to collect high quality data for the NSOs and other donor led projects and the output has been sucked into the World Bank, without acknowledgement or the potential for funding. I even once had a team (brought into country) that presented my work to me - as they were using best available data and hadn't done any additional data collection work or analysis. Shockingly they came to our conclusions, while still getting World Bank daily rates.

At some point, it might also be useful to look at the perverse incentives the Bank is putting out there for the data side of the work...

Submitted by Stanley Ndhlovu on

Thanks for the blog exactly what I was grappling with today and discussing with colleagues how to ensure enumerator collect quality baseline data.

Submitted by Benjamin Morley on

Data collection from qualitative sources can also be very difficult, both in gathering effectively and interpreting. In the humanities and social sciences like Anthropology and Sociology, there are ways to gather data from qualitative sources and an awareness of difficulties like self-selection bias and changing one's answers or behavior when they know they're being watched for specific behaviors. Perhaps cross-training with other disciplines would help improve overall data collection techniques.

Submitted by Ruth Njoroge on

I don't think its ethical to use the one-time data series/information to make/ develop policies. Every data need validation and should pass the test of reproducibility at the macro-scale level. But the teething issue is the choice of appropriate validation approach and cost associated with it!

Submitted by Oshkosh B'gosh on

Interesting. Other donors place a lot of emphasis on data quality, even contractually requiring regular data quality assessments, but at that same time invest less energy into data use. It would be nice to find the happy middle between data you can trust and data you find useful.

Submitted by Andreas Kutka on

As a professional data producer, I think this post is raising an important issue. I agree with Misha that a paradigm shift in the attitude towards data quality and data collection is needed. While there are some notable exceptions, I am generally baffled by how much attention data users pay to modelling and sampling design, while at the same time neglecting data production, the other big source of error and driver of results.

In many of the surveys that I have been involved in I found that a lot of thought and resources have gone into the study and sample design, less into the development and testing of instruments and protocols, and very little into monitoring field work or setting up data checking and cleaning procedures.

It is also not uncommon for survey teams to lack expertise and experience. Often instrument development, testing and training are done by a junior role, surprisingly often with little or no survey or local experience. Minimal input and supervision are provided from senior roles, who may or may not themselves have experience in data collection. A survey firm without any higher-level survey capacity is selected because it fits the tight budget. They are assumed to have data quality assurance under control but have varying levels of capacity and transparency. Sometimes patchy control mechanisms are implemented, but comprehensive quality assurance systems are rarely in place. A high turnover of staff in data collection lets basic mistakes and bad practices persist and makes it harder for surveys to benefit from lessons learnt.

Not allocating sufficient capacity and resources to the data collection process can produce significant levels of non-sampling error and prevent it from being detected or reported. Though typically hard to quantify, its potential effect should not be underestimated.

Two tangible examples from recent projects: Having improved the instruments and protocols of a midline school survey we found that 8 percentage points of the key indicator teacher absenteeism could be explained purely by field work activities. At baseline the instruments did not capture survey activities and the teacher absenteeism rate reported to policy makers was probably exaggerated. In another survey, while closely monitoring fieldwork we found that experienced interviewers were secretly interviewing households in the proximity if the selected household had a dog, a practise they assured us was normal in their previous work. Dog ownership was strongly correlated with income and other characteristics. You can imagine the bias this would have introduced.

There are numerous similar examples of avoidable non-sampling error affecting the reliability and validity of survey results. Non-sampling error is often not-random, potentially causing quite quickly quite large biases. Due to field work realities non-sampling error is also often correlated, multiplying the variance of variables and making it harder to get significant results (Weisberg, 2005). Depending on field work set-up it may also not be equally spread across the sample or treatment arms, affecting e.g. evaluation designs. In the worst case, survey data does not depict the population at all. In German market research it was recently uncovered that entire surveys were systematically fabricated by survey firms. Detecting non-sampling error ex-post in the data is hard.

For all its potentially damaging effect, most of non-sampling error is completely avoidable if data collection is done well and if there is sufficient engagement. In any survey, realistic timelines, sufficient financial resources and survey expertise are needed for the development and testing of instruments, to conduct meaningful field worker trainings and pilots, and to build functioning quality assurance systems. Without it the risk of undetectable non-sampling error increases. For the collection of micro data in developing countries in general, mobile phones and CAPI provide us with new technologies that complement the control tools we traditionally had at our disposal. They also allow data users to be closer to the data collection process and own more of it.

Data quality deserves more attention. The pay-offs can be large.

Links:
Weisberg 2015: http://www.press.uchicago.edu/ucp/books/book/chicago/T/bo3619292.html
German market research fraud: http://www.spiegel.de/wirtschaft/unternehmen/manipulation-in-der-marktforschung-wie-umfragen-gefaelscht-werden-a-1190711.html

Submitted by Aroos Rao on

From my personal experience..being an Enumerator or a surveyor is considered a taboo.besides the fact, this position is directly linked with "Data" Collection-part of Research around which whole research process revolves

Submitted by Istvan Gyorgy Toth on

The survey industry is contracted by marketing/media agents on the one hand and by academia/policy/statistical services researchers on the other. Data quality investments are often priced out in the first sector. Data collection firms are often unable to separate their personnel and procedures between policy research and marketing. This is a problem everywhere, and it is not simply a development related issue. Thanks for the blogpost, it is very insightful and useful for all involved in empirical research.

Submitted by Salman on

Well said. What other option is left with decision maker besides relying on the unreliable data? One answer could be to corroborate data with some other well-known data, if it is easily available.

Submitted by Flavius on

Thank you for this sobering perspective, dear Michael! If I may, can you point to surveys or studies examples of "data quality can absolutely be improved through the right combination of resources and human capital," in your conclusion? Thank you!

Add new comment