Published on Development Impact

Getting to better data: Talking to strangers

This page in:
About 15 years ago, when I was doing my dissertation research with a professor with experience in fieldwork, we did a 15 round survey with households in Ghana.   Given the frequency of the visits, we based the enumerators in the village.  But we were careful to hire enumerators from nearby big towns -- not the villages in which we were working.  This was partly for skills, but mostly to make sure that the enumerators wouldn't be asking sensitive questions of people they knew.   

This is a fairly common practice in surveys -- both for those that we do for impact evaluations and also, in many cases, for national surveys (although there are some notable exceptions to this).       But there is a tension here, as a paper by Mariano Sana, Guy Steklov, and Alexander Weinreb points out.   And that tension is that we don't want the enumerators to be a stranger in any kind of sense that alienates people and makes them afraid to talk.     But we want them to be a stranger to whom all kinds of private information on income, sexual practices, and family history is revealed to. 

So Sana and co. ask the question: would local folks do better?   And they set out an experiment to examine this.     The answer: you don't get the same answer for some important questions, but you do get statistically equivalent data for a whole lot of questions.  

The setup is nice.  Working in the Dominican Republic in a provincial town, they bring a set of enumerators from the capital to work on a survey and then they also hire a bunch of local enumerators.     The characteristics of these two groups differ somewhat -- notably the outsiders average 33.5 years old, while the locals average 24.     They then randomly assign these interviewers to respondents (and randomize the locals across people they know and people they don't know as well - but they only use those they don't know in this paper).  The enumerators interview 20-50 year old women on a range of issues including demographics, household composition, property ownership, income, remittances, family planning, and vaccinations.  In addition, Sana and co. throw in some questions on tolerance and a question asking whether the respondent knew some real famous people and 2 fictitious people.  

The crux of the results is a set of regressions that adjust for the clustering of respondents within interviews.    And here they find some heterogeneity in responses.    But before getting to these, it's interesting to note that both groups of interviewers had pretty low (and not statistically different) refusal rates - 1.94% for outsiders, and 2.91% for locals. 

In terms of responses to individual questions, it's important to note (again) that for most of the questions, there isn't a statistically significant difference.    But where you do see a difference is interesting.   First off, respondents are significantly less likely to report that they own their home to locals.    Similarly, they report substantially lower contraception use to local enumerators, an effect that is concentrated among married women (perhaps not surprisingly the levels they report to outsiders lines up with the DHS).   Respondents also report lower levels of remittances received (significant at 10 percent for the unconditional estimates) to local enumerators but they also report lower outflows.   So, lining this up:  locals are going to get some lower measures of household welfare, and lower contraception use. 

Sana and co. also look for some attitudinal differences and plain out lying.    In terms of attitudes, locals get significantly lower reports of tolerance of homosexuals and prostitutes.   And then Sana and co. ask if they've heard of two fake people.    Local interviewers are less likely to get a positive response to this.   So in this dimension, folks are clearly more likely to lie to outsiders.   What about the other responses?  Clearly folks are lying to some of the enumerators.   And there are good reasons, as Sana and co. lay out, why both insiders and outsiders might be getting skewed responses.     Part of the difference in responses might be due to enumerator characteristics -- they can't totally rule out age (although they run some interactions) given that it seems like the overlap in ages isn't robust enough.    But part of the difference seems to stick to the outsider-insider distinction.    And my conclusion after reading their paper is that the truth is out there, but I still don't know if it's in my datasets.


Markus Goldstein

Lead Economist, Africa Gender Innovation Lab and Chief Economists Office

Join the Conversation

The content of this field is kept private and will not be shown publicly
Remaining characters: 1000