Book Review: Failing in the Field – Karlan and Appel on what we can learn from things going wrong


This page in:

Dean Karlan and Jacob Appel have a new book out called Failing in the Field: What we can learn when field research goes wrong. It is intended to highlight research failures and what we can learn from them, sharing stories that otherwise might otherwise be told only over a drink at the end of a conference, if at all. It draws on a number of Dean’s own studies, as well as those of several other researchers who have shared stories and lessons. The book is a good short read (I finished it in an hour), and definitely worth the time for anyone involved in collecting field data or running an experiment.

A typology of failures
The book divides failures into five broad categories, and highlights types of failures, examples, and lessons under each:

  1. Inappropriate research settings – this includes doing projects in the wrong place (e.g. a malaria prevention program where malaria is not such an issue), at the wrong time (e.g. when a project delay meant the Indian monsoon season started, making roads impossible and making it more difficult for clients to raise chickens), with a technically infeasible solution (e.g. trying to deliver multimedia financial literacy in rural Peru using DVDs when loan officers weren’t able to find audio and video setups).
  2. Technical design flaws – this includes survey design errors like bloated surveys full of questions where there is no clear plan for how they will be used in analysis, and poorly designed questions; measurement protocols being inadequate (they have an example where because their survey offered a couple of dollars in incentivized games, others would pretend to be the survey respondents to try and participate, and they didn’t have good id systems); and mistakes in randomization protocols (e.g. a marketing firm sorting a list of donors by date of last donation, and splitting it in half so that the treatment group were all more recent donors than the control)
  3. Partner organization challenges – a big part here is realizing that even if the top staff are committed, the lower-tier staff may have limited bandwidth and flexibility. A couple of examples were from programs which tried to use existing loan officers to deliver financial literacy training, only to find that many of them were not good teachers; and where bank tellers decided to ignore a script for informing customers of a new product because they felt it slowed them down from serving clients as quickly as possible.
  4. Survey and Measurement Execution Problems – these include survey programming failures (e.g. trying to program a randomized question ordering, but it ends up skipping the question for half the sample); misbehaving surveyors (who make up data); not being able to keep track of respondents (as noted in the impersonation example above); and measurement tools not working (as in my RFID example).
  5. Low Participation Rates – they separate this into low participation during intake (when fewer people apply for the program than expected); and low participation after random assignment (when fewer of those assigned to treatment actually take it up). They note how partner organizations are often overconfident on both accounts. They have examples of financial education in Peru where only 1 percent of groups assigned to treatment completed the full training, and a new loan product in Ghana where delays in processing and a cumbersome application process meant that while 15 percent of business owners visited the branch, only 5 percent applied and 0.9 percent received a loan.
After discussing each of these general categories with some examples, it then goes into more depth on case studies of six failed projects to provide a lot more detail on what went wrong and why, as well as lessons learned.

A few general lessons
So many of the failures seem to have come from lack of piloting/dealing with immature products– studies involving launching new products, but not getting all the implementation issues sorted out in advance beforehand. They note the challenge of doing this – the researchers and partners are often excited and eager to launch their new product, and adding a step that might take everyone back to the drawing board might seem politically untenable.
One lesson they note is that individual failures tend to snowball quickly if they are not caught, so many studies can find themselves facing many of these.
Another is that researchers find it hard to know when to walk away. They give an example of a project on supply-chain credit, where the researchers lined up funding, a partner bank, a consumer product distributor etc. and then kept hitting roadblocks such as software problems at the bank, or changes in design. After 3 baseline surveys, all of which had to be scrapped, and nearly three years, they finally abandoned the project – but “more than once they considered shutting down the project, but there always seemed to be a glimmer of hope that encouraged them to try again or hold another meeting with the partner”. Another example comes up in one of the case studies – when a research team had planned to run a project on sugarcane that fell through, they hastily put together a poultry loan product instead which failed.

Some reactions
I was struck most of all by how mundane many of the stories of failure were – products were launched too soon, they involved additional work for people in the partner organization who didn’t do this work, not enough people applied for a program, someone messed up survey coding etc. Failure here is not coming from the survey team getting accused of witchcraft, enumerators who have never ridden a motorcycle before claiming they were experts, the research team all contracting malaria etc. Instead it comes, by-and-large, from a set of problems that in retrospect could often have been controlled and may seem obvious to an outside party. This is why sharing these lessons is all the more important – these are not all self-deprecating funny stories, but things researchers may otherwise not share for fear of coming out looking bad.
The second point was that all of the examples in the book came from working with NGOs, reflecting much of the work Dean has done. Working with governments brings a whole new set of ways to fail. From basic bureaucracy problems to political economy to much less ability of researchers to control what is getting done, I am sure there are many lessons that can be drawn from such failures.

Share your experiences
This book is a great first step in sharing lessons from failure, but it is only a beginning. We agree with Dean and Jake that more learning from failure should take place. We are therefore pleased to announce that we will partner with them in having a special “learning from failure” series on the blog. We’ve collated some of our previous posts on failure in one place, and would love others to share their experiences. If you’ve got a failure to share, please email it to
Points to note:
  • We are interested in examples of failures of impact evaluations (research project failures) as opposed to failures of development projects more generally
  • Please send it as a Word document in 11 point Calibri font.
  • If you have any pictures/graphs, send this as a .jpg attachment
  • Keep it short, and try and draw out the lessons for others
  • Make sure to be specific and concrete in the advice from this – as Markus told me “a lesson to work harder and pay more attention” isn’t so helpful.
  • The intention is not to embarrass anyone, but help improve work going forward. So anonymize parties as needed.


David McKenzie

Lead Economist, Development Research Group, World Bank

October 10, 2016

"from a set of problems that in retrospect could often have been controlled and may seem obvious to an outside party"
This sounds just like about every software development project I have ever worked on. In 30 years, I have never worked on a project where one could not say "You know, if you pretend you are an outside 'consultant' and you reviewed this project you would have to say 'This is exactly how all the literature says you are NOT supposed to do it!'" But, of course, the immediate response is always a set of rationalizations about how this project is different....
Fortunately, in a couple of projects, I have been able to push through changes and found, to my surprise, that doing things "correctly" actually works pretty well in practice as opposed to theory. For years I was never sure whether doing things the "right way" actually worked, as I had never witnessed it first hand nor extensively debriefed anybody who had done the right thing (and was not simply a "consultant" selling something).
You might find the book To Engineer is Human (by Henry Petroski) to be interesting. It discusses how helpful it is to catalog failures, positing that it is only through failure we can expand our fields of knowledge.

Adam McCarty
October 13, 2016

I have been running a private ODA consulting firm out of Hanoi, Vietnam, since 2001. We have 24 staff in Hanoi, and another eight in our Yangon office. I have been working as an Economist in development for 30+ years. This is all leading to explain why I have seen and been involved in literally dozens of research project failures over the years (and many more if we include qualitative research, which the academic development community seems to have forgotten about nowadays). Here are some generic lessons from my experience:
1. At some stage to “many multiples” must rule out a quantitative approach.
The DFID-funded BRACED in an example. At first glance it is a uniform project across hundreds of villages to “strengthen resilience”, but dig and it is clearly not: there are three implementing INGOs, each does different sets of interventions in their specific geographic areas, with differences in activities and timing of inputs with each INGO. This dog’s breakfast of implementation happened “as there were many delays” so an evaluation-friendly model never developed. Given that, no surveys should have been done, but they could not be abandoned because “they are in the work plan” – and there is no formal step that critically reviews the relevance of pending activities.
2. With very few exceptions every INGO-led quantitative evaluation is a waste of money.
Bank and other academics do not understand this problem. Their gaze is only upon large-scale surveys, good and bad. Yet for every one of those there are dozens done by INGOs who are “ticking a box” required by those who give them money. Typically small projects, and using some illogical rule of thumb like “3% of total budget should be for M&E”, they use the M&E pittance to survey maybe 300 households. As nobody cares about the results (implementer or funder), all the usual problems are exacerbated: a baseline survey implemented long after the interventions started; poor sample selection; rushed implementation; shoddy analysis; etc.
3. Unpacking incentive structures is the path to understand why nobody cares about results.
I exaggerate, of course. Poor Economics, Gates, and other “big project” people do care and add value through gold standard rigorous work. But that is maybe 1% of all projects. Most others involve organisations and people who have no incentive to take results seriously. All organisations involved in the “value sucking chain” are constantly having to defend their budgets. Personal careers depend on what was implemented, not results. The prestige of organisations depends on how big they are. Work through the incentives story and you understand why results (and failures) are not just irrelevant – they are positively discouraged. But it also leads you to consider solutions. Here is one: An INGO implements a project for $2m and is paid in full upon completion. Two years later a post-evaluation is done and based on that the INGO will get a bonus of up to 30% of the original project value based on measured sustained results. That bonus they can spend on any projects they wish to develop without (the usual) micromanagement by the funder. Catch: the post-evaluation is a public report.
The development sector suffers dreadfully from a trivial approach to understanding incentive structures. Thus we are caught in an endless cycle of agreeing on “the best thing to do” but never doing it: evaluations; tied aid; information sharing; cooperation; sharing failures; bla bla bla. Healthy cynicism is needed – directed into incentives research, leading to innovative solutions.