Published on Development Impact

Lessons from some of my evaluation failures: Part 1 of ?

This page in:

We’ve yet to receive much in the way of submissions to our learning from failure series, so I thought I’d share some of my trials and tribulations, and what I’ve learnt along the way. Some of this comes back to how much you need to sweat the small stuff versus delegate and preserve your time for bigger picture thinking (which I discussed in this post on whether IE is O-ring or knowledge hierarchy production). But this presumes you have a choice on what you do yourself, when often in dealing with governments and multiple layers of bureaucracy, the problem is your potential for micro-management can be less in the first place. Here are a few, and I can share more in other posts.

Case 1: testing various mechanisms to bring firms into the formal sector in Brazil, ultimately published here.
Failure 1: We were worried about the response rates of informal firms to our follow-up survey. So we wanted to have some attractive incentives to get firms to respond. At the time, iPads had just been released and were much cheaper in the U.S. than in Brazil. So we thought, why not get 10 of them here, take them to Brazil, and use as prizes. We bought them here, but the World Bank procurement process then tagged them as World Bank official property. They got to Brazil, where they sat for over a year and a half while the World Bank, survey company, and municipal government couldn’t get their bureaucracies to figure out how to transfer ownership of these. They eventually then ended up getting sent back to the U.S., where they had of course been superseded by the new model.
Lesson: I had done this to try and save about $5000 for my project. Instead it ended up taking up many hours of discussions, emails, and ultimately didn’t work. It would have been far easier to include respondent incentives directly into a contract for a survey company, and have them be responsible for buying them (even at higher costs). So the lesson is not to sweat the small stuff on costs without also factoring in the high costs of procurement processes.

Failure 2: The most important outcome in this study was whether or not the firm formalized. We designed a follow-up survey which asked about this, and then many follow-up questions about the formalization process. We finalized the questionnaire, and sent to the survey firm for translation. They translated it, and then made a last minute change in question ordering, so that a skip pattern meant that the key questions on formalization were skipped for many of the firms. By time we discovered this, the survey was already complete. We sent the survey firm back to attempt to re-ask these questions, but they could only get 71 percent of those interviewed to answer more questions. [Luckily we could use administrative data to get the formalization outcome in the end].
Lesson: Worry about the skip patterns in your translated version, not just assume that if they are ok in the English version, that they should be in the translated version.

Case 2: attempting to use distance to the tax office as an instrument for formalization in Peru (abandoned evaluation).
Failure 3: In some of the first work I did at the World Bank, I worked with the Bolivia country office to attempt to use non-experimental methods to measure the impact of formalizing on firms. The approach we came up with was to use the distance to the tax office as an instrument for formalizing, conditional on distances to other government offices, to the city center, and several other controls. The idea here being that information/cost of registering varied with this distance. This worked well in Bolivia, resulting in this paper. So when the Peru country office asked if we could replicate this work in Peru, it seemed likely we could. We designed the survey, but found that the closer we got to the tax office, the higher the survey refusal rate and the item non-response rate on profits and sales questions. It turned out that the tax inspectors there (unlike in Bolivia) had quotas to meet, and so would go out in their lunch hours and inspect the closest firms they could find – making firms very unwilling to talk about finances or formality. As a result, we couldn’t use the distance to the office as an instrument there.
Lesson: pilot testing the survey around the tax offices, and more talking with the tax inspection team to assess the exclusion restrictions would have helped.

Case 3: Trying to recruit subjects for a financial literacy study in Mexico, ultimately published here with more of the failure details in the online appendix.
Failure 4: We were working with a major financial institution in Mexico City, and wanted to test the effect of financial education on a sample of their clients. The partner institution offered to send letters to clients in the same way as they send credit card offers, and said their usual response rate was 2 to 3 percent. Since we wanted a sample of around 1000 clients, we therefore had them mail 40,000 letters to their clients, which mentioned we were partnering with the institution on financial education, and asked them to send in a 2-page screener survey if they were interested, with half the sample also offered a 75 peso (US$5) payment if they were in the first 200 to reply. It ended up being about $1 per letter sent, by time we paid for postage, printing, and return postage envelopes. So we spent around $40,000 to send these letters, expecting to get 800-1200 responses and a short baseline survey in return. Instead we only received 42 responses (0.1 percent)! We ended up having to recruit the majority of respondents in-person in bank branches and on the streets instead.
Lesson: here again a pilot would have been useful. One problem here was that all mailings were to be done by the marketing team of the financial institution, and confidentiality issues meant we could only get the personal details of clients who opted in to the study by responding. Also, given how low the expected response rate was (2-3%), we still would have had to send it out to 1000 or so clients, but this still would have saved a lot.

Failure 5: In the same study, once we had our sample, we wanted to make it as easy as possible for those who were invited to financial literacy training to attend. So for one sub-treatment group, we offered to send taxis to their homes to pick them up and take them to training. However, given security concerns about being kidnapped or robbed by a taxi driver, this offer was seen as more of a deterrent than something people appreciated.
Lesson: even the simplest solutions may run foul of local context.
Of course, as with many low-take-up failures, both of these were instructive in terms of the low value people placed on the intervention in the first place.
Anyway, that’s a few examples that I thought might be useful to share. Remember you can read about some of the other failures we’ve posted about here, and we’d love to have others share their experiences – we can collate several small examples from several contributors if you just have something specific that doesn’t seem enough for a post by itself, or happy to have you share in more detail a specific case as in Dean and Jake’s post : just email them to (see the end of the book review for more details).


David McKenzie

Lead Economist, Development Research Group, World Bank

Join the Conversation

The content of this field is kept private and will not be shown publicly
Remaining characters: 1000