Getting to better data: Talking to strangers


This page in:

About 15 years ago, when I was doing my dissertation research with a professor with experience in fieldwork, we did a 15 round survey with households in Ghana.   Given the frequency of the visits, we based the enumerators in the village.  But we were careful to hire enumerators from nearby big towns -- not the villages in which we were working.  This was partly for skills, but mostly to make sure that the enumerators wouldn't be asking sensitive questions of people they knew.   

This is a fairly common practice in surveys -- both for those that we do for impact evaluations and also, in many cases, for national surveys (although there are some notable exceptions to this).       But there is a tension here, as a paper by Mariano Sana, Guy Steklov, and Alexander Weinreb points out.   And that tension is that we don't want the enumerators to be a stranger in any kind of sense that alienates people and makes them afraid to talk.     But we want them to be a stranger to whom all kinds of private information on income, sexual practices, and family history is revealed to. 

So Sana and co. ask the question: would local folks do better?   And they set out an experiment to examine this.     The answer: you don't get the same answer for some important questions, but you do get statistically equivalent data for a whole lot of questions.  

The setup is nice.  Working in the Dominican Republic in a provincial town, they bring a set of enumerators from the capital to work on a survey and then they also hire a bunch of local enumerators.     The characteristics of these two groups differ somewhat -- notably the outsiders average 33.5 years old, while the locals average 24.     They then randomly assign these interviewers to respondents (and randomize the locals across people they know and people they don't know as well - but they only use those they don't know in this paper).  The enumerators interview 20-50 year old women on a range of issues including demographics, household composition, property ownership, income, remittances, family planning, and vaccinations.  In addition, Sana and co. throw in some questions on tolerance and a question asking whether the respondent knew some real famous people and 2 fictitious people.  

The crux of the results is a set of regressions that adjust for the clustering of respondents within interviews.    And here they find some heterogeneity in responses.    But before getting to these, it's interesting to note that both groups of interviewers had pretty low (and not statistically different) refusal rates - 1.94% for outsiders, and 2.91% for locals. 

In terms of responses to individual questions, it's important to note (again) that for most of the questions, there isn't a statistically significant difference.    But where you do see a difference is interesting.   First off, respondents are significantly less likely to report that they own their home to locals.    Similarly, they report substantially lower contraception use to local enumerators, an effect that is concentrated among married women (perhaps not surprisingly the levels they report to outsiders lines up with the DHS).   Respondents also report lower levels of remittances received (significant at 10 percent for the unconditional estimates) to local enumerators but they also report lower outflows.   So, lining this up:  locals are going to get some lower measures of household welfare, and lower contraception use. 

Sana and co. also look for some attitudinal differences and plain out lying.    In terms of attitudes, locals get significantly lower reports of tolerance of homosexuals and prostitutes.   And then Sana and co. ask if they've heard of two fake people.    Local interviewers are less likely to get a positive response to this.   So in this dimension, folks are clearly more likely to lie to outsiders.   What about the other responses?  Clearly folks are lying to some of the enumerators.   And there are good reasons, as Sana and co. lay out, why both insiders and outsiders might be getting skewed responses.     Part of the difference in responses might be due to enumerator characteristics -- they can't totally rule out age (although they run some interactions) given that it seems like the overlap in ages isn't robust enough.    But part of the difference seems to stick to the outsider-insider distinction.    And my conclusion after reading their paper is that the truth is out there, but I still don't know if it's in my datasets.


Markus Goldstein

Lead Economist, Africa Gender Innovation Lab and Chief Economists Office

Join the Conversation

Hillary Johnson
July 12, 2013

Very interesting. However, I found one of the conclusions not completely convincing. In the last paragraph, significantly lower reports of tolerance of homosexuals and prostitutes is interpreted that people are lying to local enumerators. This may not be the whole story, and taken out of context this conclusion is not entirely convincing. For example, let's say that in the village where the person lives, the stigma of homosexuality is very strong. Even if the person does not really have a problem with homosexuals, perhaps they may not want to express this to a local, for fear of being judged or even accused and stigmitized as being a closet homosexual. Thus, they align themselves with the answer that they think the enumerator wants to hear--and not in the sense of hiding their intolerance but rather feeling pressured to show intolerance. In such a scenario, it would be the local enumerator hearing the lies and the outsider hearing the truth.

Diwakar Srivastava
June 01, 2016

Very well documented.
This matches with my experience of survey based research in rural areas of India especially Hindi speaking states.
Collecting data on topics having social taboo and sensitive issues is a big challenge of conducting survey in rural areas of India. Problem starts when the enumerators themselves find it difficult to communicate with the respondent in line with the training imparted to them. Quite often researchers themselves don't go to field during Pilot and hence are not sure on how to minimise such errors in data capturing. The researcher tries to look in to the numbers from the glass (s)he wears based on individual experience.
Local enumerator vs outsider has impact only to the extent of use of dialect if the respondent is comfortable in the local dialect only.