On October 3rd, I sent out a survey asking people what was the biggest, most embarrassing, dramatic, funny, or other oops mistake they made in an impact evaluation. Within a few hours, a former manager came into my office to warn me: “Christel, I tried this 10 years ago, and I got exactly two responses.”
I’m happy to report I got 49 responses to my survey. My initial idea was to assemble a “top 10” of mistakes, so I promised the 10 winners they would get a small prize. Turns out, assembling a top 10 was a bit tricky, but here’s my attempt at classifying the information I got.
#1 - A first batch of comments were stories of random, funny things that happened in impact evaluations – here’s one that got me cracking up in my office on a Friday afternoon:
A researcher launched a baseline survey in Liberia, and hired a local organization of ex-combatants turned enumerators. The two criteria for enumerators were that they have to be literate and have to be able to drive a motorcycle, as the enumerators would have to ride between villages through three provinces. During the initial training, the enumerators the organization identified were fantastic - smart, hard-working,and kind. On the day of launch of the survey, 14 motorcycles were delivered. Everyone was packed up and ready to go for a few weeks. Then the enumerators got on the motorcycles. About two and half miles after driving, it became clear that that while the enumerators were literate, they definitely had never driven motorcycles before and essentially lied during the interviews. What was supposed to be a two and half hour drive our first day took nine hours going around 15 kilometers per hour. While nobody got really hurt, the enumerators crashed into large mounds of dirt on the side of the road many, many times over the next few days. The researcher noted he had never gotten back from field work with such a layer of dirt encrusted into his skin.
Another story was just as random but maybe not as funny:
A team implemented a follow-up survey with 6000 households in northern Bangladesh about three years after the implementation of an ambitious anti-poverty program. The oops was that about 1/3 of the data was gathered during the month of Ramadan. Although most of the team was Muslim, noone anticipated the effect of Ramadan, on, for example, nutrition data.
While these two mistakes were kind of random and hard to predict, I did find though that most responses reflected much more systematic issues. My next 4 items in the top 5 are of the more systematic type:
# 2 -Hiring the wrong person(s) on the impact evaluation team, or not defining their responsibilities well.
The evaluation manager decided that surveys do not deliver any credible results”
“We hired local consultants (recommended by colleagues) that were not up to the task”
“My Research Assistant took big shortcuts”.
#3- Mistakes in the design:
“The intervention was assigned at the group level, but the data were collected at the individual level. In our power calculations, we forgot to take into account this clustering of individual in treated group – we ended up recruiting way too few individuals into our experiment.”
# 4 -Issues in the design and (preparation of) data collection: This was by far the most represented category. Issues abound regarding “lack of IDs”: For example
“The survey firm failed to include student ID numbers in the administration of the student test. Good luck merging two years of student data by long last names that all start with an R.”
“After the questionnaire was finalized and translated, the cover sheet for the “mothers” module was 1 line too long and disturbed the formatting. My colleague decided to lop off the top line to make things fit for printing. Unfortunately, it was the line that included the code linking each mother who answered the module to the household register.”
“The Cover page of the questionnaire was never printed. So the contact information for the respondents was completely missing.”
Further issues were related to correct specification of the terms of reference of survey firms. For example:
“Verbal agreements with survey firms about changes to the TOR – they are extremely costly. Lo and behold, after field work and most payments are processed, the data were never found, or were held back strategically.”
“We specified in the TOR that the data from the survey belonged to us. But we never said anything about the data log, and without it, we could not interpret the data, so we were unable to use them. Such a waste!”
#5 - Monitoring the intervention after the baseline seems to be another source of issues.
“We had a randomized design but didn't keep close enough communication with the government. They distributed the intervention non-randomly without our knowledge.”
“We had a perfectly good (and random!) encouragement design. Then, our implementing partner started bribing the participants to take up the treatment!”
So I managed to assemble a Top 5, albeit not the promised Top 10. But I have to say it was kind of difficult to pick out the “best” mistakes. The thing is, they all tell us something in their own way. One of the reasons I sent out the survey was to see whether the Impact Evaluation Toolkit that was just published actually responds to the types of issues that arise in real impact evaluations. The Toolkit is organized by stage in the impact evaluation, starting from the definition of the research questions and assembling the team, through field work, data analysis and monitoring of the intervention. I did a rough tally of how many issues were brought up in each stage, and this is what I find:
Of course, this is in no way representative of the issues that actually happen in IEs, since the issues are self-reported and my own survey sampling is off by any measure (I sent it to a bunch of people that I know work on IE…). It does tell me however that people struggle especially in the initial stages of setup of the evaluations – getting the team, the design, and the data collection right is a challenge. Do I think the Impact Evaluation Toolkit  can help with this? Actually, I do.
When doing survey work, one needs good questionnaires to start with, examples of enumerator training, a data entry protocol, a data entry program, informed consent forms, an agreement on data access, solid terms of reference for the survey firm, a protocol for Institutional or Ethics Board review, field manuals, survey progress reports, …. The list goes on and on. Up to now, it wasn’t so easy to access those tools – in many cases teams learnt by trial and error, and getting hold of good terms of reference was a matter of emailing people to ask them what they had used in the past. With the Toolkit, the Bank’s RBF team is making available the collective knowledge that has been accumulated in about a dozen evaluations. Every tool can be downloaded and edited – i.e. no PDFs that need to be retyped! And while some of the tools are specific to impact evaluations of Results-based Financing in Health, most tools can easily be adapted to impact evaluation and surveys in other sectors.
Please check out the toolkit  and feel free to use the tools in your own work… but don’t forget to tell us what you think and tell us how to improve the tools. That way, others can learn from your experience too.
As for the people who responded to my survey, they all deserve a prize – and they can expect it in the mail shortly!