So there I was, a graduate student doing my PhD fieldwork. In the rather hot office at the University of Ghana, I was going through questionnaire after questionnaire checking for consistency, missed questions and other dimensions of quality. All of a sudden I saw a pattern: in the time allocation questions, men in one village seemed to be doing the exact same things, for the same amount of time, on two very different days of the week.
To make a long story short, the male enumerator in that village was fabricating that data. Further investigation revealed that he was not asking the respondents about the second day, but just copying over their answers from the first day. This was my first introduction to the production of faked data.
Now, if you would rather not think about where our data comes from and what might be wrong with it, you should stop reading this post now. But if you do, an interesting new paper by Arden Finn and Vimal Ranchhod gives us a lot to think about.
Finn and Ranchhod tackle this issue in the context of South Africa. And they are working with a dataset where this issue was caught, and the respondents re-interviewed. This allows them to show us how the fabrication might matter for analyses. But before we get to this, there are a bunch of other things to consider.
First: What are we talking about when we say faking data? Data can go wrong for a host of reasons. Here the focus is on cheating or negligence on the part of the enumerators (fieldworkers) collecting the data. This can manifest in a number of ways: skipping whole interviews, skipping sections, changing one answer so that you get to skip a whole section (job? no no, you don't have a job), and ignoring the existence of household members. Enumerators can do this for a bunch of reasons: some questions can be hard to ask (e.g. have you been unfaithful to your partner), some sections are really long, households can be far away or in a less than pleasant area, and the survey remuneration structure may prioritize speed over accuracy (e.g. payment for surveys completed and no large penalties for cheating).
So is this a widespread problem? Finn and Ranchhod take us on a tour of a number of large, prominent surveys in South Africa. Two observations. One, I am psyched to see that a significant number of people are paying attention to this and have the courage to be open about this. Second, we should be worried. They document errors ranging from enumerators gaming the payoffs for time preference questions (in likely collusion with the respondents) to enumerators shirking on the within household sampling (crippling the representativeness of the sample). And the enumerators are sophisticated about this. I used to think telephoning a subset of interviewees after the interview was a good way to check but Finn and Ranchhod cite one example of an enumerator who set up her sister-in-law to answer these calls. So yes, cheating happens. And when it does it can do everything from reducing sample size to wiping out certain variables to killing the representativeness of a survey.
Finn and Rachhod provide a really useful set of tools to track enumerator cheating when they turn their attention to wave 2 of the National Income Dynamics Study (NIDS), with which they were both involved. They provide 9 possible methods to detect cheating. Only two of them prove useful in their context, but given technology differences as well as contextual differences, it is worthwhile going through all of them (keep in mind that the application here is to the second wave of a panel survey):
So while these didn’t work in the NIDS context, he two ways that actually paid off were to use Benford's law and to compare anthropometric measures across survey waves.
Benford's law is interesting. Going back to Benford's original 1938 paper, the basic argument is that in the data that is out there, naturally occurring, there is a pattern to the first digit of numbers. And while you might think it the pattern is a uniform distribution -- it isn't. It turns out to follow a logarithmic distribution, ranging from 30.1% of the first digits being 1 to 4.6% of them being 9. Using this distribution, you can find deviations and hence enumerators to target for verification. (I am going to do my next post on Benford's law, so I will save more detailed discussion for later)
The anthropometric checks Finn and Rachhod use are: 1) looking for systematic (at the enumerator level) outliers in BMI values, 2) a lot of adults who changed height from round 1 to round 2, 3)mean BMI change from round 1 to round 2 (by enumerator) and 4) spikes in the weight distribution by enumerator (here they are looking for heaping around easy numbers).
These checks reveal some enumerators where the data looks decidedly dodgy. And, of interest for those of us who work on surveys, the dodginess was concentrated in teams (including the team’s leader). So the NIDS oversight team did some intensive call backs (no sister-in-laws included). They managed to contact 781 of the 991 households that needed verification. Of these, 234 were ok. But 547 had problems: 223 partial fabrications, 322 total fabrications and 2 unclassifiable. And that's about 7.3 percent of the respondents for the wave (or 10 percent of the enumerators).
The NIDS team went back and re-interviewed these folks, and this gives Finn and Rachhod both a clean and dirty version of the data to compare in the analysis. They look at employment and health variables. They find that the univariate statistics (e.g. the mean) are fairly unaffected by this level and form of falsification. For transition matrices and first difference regressions, the fabrication matters more: while not resulting in large changes in absolute values, it can lead to qualitatively different conclusions.
These results give us one set of insights into how falsification of data might matter. Clearly, the way in which the data goes wrong matters. In the examples farther above, there are some cases where the answer is totally wrong, others where the sample is smaller but the variables aren't compromised (particularly when enumerators cover similar distributions of respondents).
All in all, this is further insight into how the sausage we call data is made. Unfortunately, in this case, we can't skip the meat entirely. It's about how to get a better hot dog -- and I will tackle that further in my next post. Have a good July 4th.
To make a long story short, the male enumerator in that village was fabricating that data. Further investigation revealed that he was not asking the respondents about the second day, but just copying over their answers from the first day. This was my first introduction to the production of faked data.
Now, if you would rather not think about where our data comes from and what might be wrong with it, you should stop reading this post now. But if you do, an interesting new paper by Arden Finn and Vimal Ranchhod gives us a lot to think about.
Finn and Ranchhod tackle this issue in the context of South Africa. And they are working with a dataset where this issue was caught, and the respondents re-interviewed. This allows them to show us how the fabrication might matter for analyses. But before we get to this, there are a bunch of other things to consider.
First: What are we talking about when we say faking data? Data can go wrong for a host of reasons. Here the focus is on cheating or negligence on the part of the enumerators (fieldworkers) collecting the data. This can manifest in a number of ways: skipping whole interviews, skipping sections, changing one answer so that you get to skip a whole section (job? no no, you don't have a job), and ignoring the existence of household members. Enumerators can do this for a bunch of reasons: some questions can be hard to ask (e.g. have you been unfaithful to your partner), some sections are really long, households can be far away or in a less than pleasant area, and the survey remuneration structure may prioritize speed over accuracy (e.g. payment for surveys completed and no large penalties for cheating).
So is this a widespread problem? Finn and Ranchhod take us on a tour of a number of large, prominent surveys in South Africa. Two observations. One, I am psyched to see that a significant number of people are paying attention to this and have the courage to be open about this. Second, we should be worried. They document errors ranging from enumerators gaming the payoffs for time preference questions (in likely collusion with the respondents) to enumerators shirking on the within household sampling (crippling the representativeness of the sample). And the enumerators are sophisticated about this. I used to think telephoning a subset of interviewees after the interview was a good way to check but Finn and Ranchhod cite one example of an enumerator who set up her sister-in-law to answer these calls. So yes, cheating happens. And when it does it can do everything from reducing sample size to wiping out certain variables to killing the representativeness of a survey.
Finn and Rachhod provide a really useful set of tools to track enumerator cheating when they turn their attention to wave 2 of the National Income Dynamics Study (NIDS), with which they were both involved. They provide 9 possible methods to detect cheating. Only two of them prove useful in their context, but given technology differences as well as contextual differences, it is worthwhile going through all of them (keep in mind that the application here is to the second wave of a panel survey):
- Number of deaths across waves. Less people = faster surveys. Enumerators with abnormally high mortality among their respondents are suspect.
- Number of refusals/not available. No faster way to get through your list than not finding people. (and don't think that paying only for completed interviews solves this -- the hard to get households might still not be worth it)
- Look for folks disproportionately activating skip codes that get them out of a lot of additional questions. (South African problem with this: high unemployment means that a lot of legitimate interviews use the skip codes)
- Look at the length of the interview. (Even though the NIDS wave 2 was CAPI (computer based), the time stamp for finishing an interview was manually activated and a bunch of the enumerators only activated it when they were done for the day and ready to upload).
- Use the GPS coordinates to check where the interview happened. (this will work if this is automated in your data collection, if it's manual the main flaw is that the enumerator needs the GPS location to find the household in wave 2)
- Compare the signatures on consent forms across waves (way too labor intensive)
- Look for low-rates of people joining the household through in-migration or birth.
So while these didn’t work in the NIDS context, he two ways that actually paid off were to use Benford's law and to compare anthropometric measures across survey waves.
Benford's law is interesting. Going back to Benford's original 1938 paper, the basic argument is that in the data that is out there, naturally occurring, there is a pattern to the first digit of numbers. And while you might think it the pattern is a uniform distribution -- it isn't. It turns out to follow a logarithmic distribution, ranging from 30.1% of the first digits being 1 to 4.6% of them being 9. Using this distribution, you can find deviations and hence enumerators to target for verification. (I am going to do my next post on Benford's law, so I will save more detailed discussion for later)
The anthropometric checks Finn and Rachhod use are: 1) looking for systematic (at the enumerator level) outliers in BMI values, 2) a lot of adults who changed height from round 1 to round 2, 3)mean BMI change from round 1 to round 2 (by enumerator) and 4) spikes in the weight distribution by enumerator (here they are looking for heaping around easy numbers).
These checks reveal some enumerators where the data looks decidedly dodgy. And, of interest for those of us who work on surveys, the dodginess was concentrated in teams (including the team’s leader). So the NIDS oversight team did some intensive call backs (no sister-in-laws included). They managed to contact 781 of the 991 households that needed verification. Of these, 234 were ok. But 547 had problems: 223 partial fabrications, 322 total fabrications and 2 unclassifiable. And that's about 7.3 percent of the respondents for the wave (or 10 percent of the enumerators).
The NIDS team went back and re-interviewed these folks, and this gives Finn and Rachhod both a clean and dirty version of the data to compare in the analysis. They look at employment and health variables. They find that the univariate statistics (e.g. the mean) are fairly unaffected by this level and form of falsification. For transition matrices and first difference regressions, the fabrication matters more: while not resulting in large changes in absolute values, it can lead to qualitatively different conclusions.
These results give us one set of insights into how falsification of data might matter. Clearly, the way in which the data goes wrong matters. In the examples farther above, there are some cases where the answer is totally wrong, others where the sample is smaller but the variables aren't compromised (particularly when enumerators cover similar distributions of respondents).
All in all, this is further insight into how the sausage we call data is made. Unfortunately, in this case, we can't skip the meat entirely. It's about how to get a better hot dog -- and I will tackle that further in my next post. Have a good July 4th.
Join the Conversation