What’s wrong with how we do impact evaluation?


This page in:

In a recent 3ie working paper, Neil Shah, Paul Wang, Andrew Fraker and Daniel Gastfriend of IDinsight, make a case for what they call decision focused impact evaluation.   What, you may ask is a decision focused impact evaluation?   Shah and co. define it as one which “prioritises the implementer’s decision-making needs over potential contributions to global knowledge.   They contrast these to what they call knowledge focused evaluations which are “those primarily designed to build global knowledge about development interventions and theory.”  

Shah and co. acknowledge that the distinction across these two types is not binary and, to me, it’s not really helpful.   What’s interesting in their paper is that they raise a number of critiques of how a lot of impact evaluations are done that I keep hearing.   So I thought it might be time to revisit some of these issues. 

First up:   impact evaluations are asking questions that aren’t directly relevant to what policymakers want to learn.   They have a nice quote from Martin Ravallion, where he says: “academic research draws its motivation from academic concerns that overlap imperfectly with the issues that matter to development practitioners.”    For sure, some of this lack of overlap comes from questions policymakers want to evaluate but academics won’t usually touch (e.g. single replication studies of things that have been well published before).    However, this gap is being somewhat (and hopefully increasingly) filled by non-academic evaluation work (and by the new spate of multi-country replications where academics play a lead role).  

In terms of the questions academics are asking, I think there is a significant degree of overlap of questions that are relevant to policymakers.   Let’s take a deeper look at the kind of questions impact evaluations are asking. First, there is uptake (e.g. evaluations on getting businesses to formalize).   Second, there is what is the impact of the intervention on outcomes (e.g. do cash transfers increase the chance that kids go to school).   Third, there is the why:  what mechanisms lead the intervention to cause these changes in outcomes?   This is the area where an economic theory-informed approach is more likely to be applied.   I spend a fair bit of time with policymakers of various stripes and I see folks interested in answers to all of these question – be it in testing their priors on a theory of change (the why), getting a sense of returns to investment (what), and why or why not people are coming for their program (uptake).   Answering these in terms of actual programs that happen somewhere helps make this evidence particularly salient.    (For some further complexity on thinking about the distinction among these three and what kind of programs they are working with, it is worth looking at a nice recent post by David).   

This is not to say that there aren’t some academic evaluations that are not of immediate or medium term use for policy makers.    This is particularly true for “why” type experiments that are particularly constructed and controlled in order to isolate one aspect of economic theory (David’s post has an example of this, and this is partly behind my quote in Shah and co.’s paper).   But on the whole, there is a significant overlap for policymakers who are curious about what their programs are doing and how this might happen.  

One thing that is critical to this point is that knowledge is portable.   Which brings us to a second critique:  external validity.   Eliana Carranza and I blogged about this earlier as a starting point for thinking about this.    The basic point here is that policy makers are generally not morons.   They are acutely aware of the contexts in which they operate and they generally don’t copy a program verbatim.   Instead, they usually take lessons about what worked and how it worked and adapt them to their situation.   Every policymaker I’ve talked to has raised this issue when I bring some impact evaluation evidence to the table.   Sometimes this leads to the conclusion that we’ve got to try something different and sometimes it leads to a conversation on how to adapt the lessons to her/his particular context.  And bringing the folks who implemented the original (evaluated) program into the discussion can help facilitate this process.

Moving on to critiques 3 and 4:   Impact evaluations take too long and are too expensive.   In support of this, Shah and co. cite some statistics on the lag between endline data collection and publication.   They also (in a footnote) note that some folks do share results before they are published.  This is precisely the answer and why this is not really a problem.  I find in my work that sharing the results with the program implementers, usually before I even start writing, gets me deeper and/or different insights into what they mean.  And from talking with others, I’m not alone in this.    Of course, there will be some lag between endline and results, which will be driven by how long it takes to enter and clean the data and how complex the analysis is.   Another dimension on which Shah and co. raise the issue of taking too long is longer data collection periods to collect downstream indicators.   David has a nice post on this as well which explains why there are very, very good reasons to wait a bit.  

On the expense (and maybe length of analysis side) Shah and co. raise the issue of survey length.    There are a couple of responses to this.   First, in order to get a complete cost-benefit, we need a fairly robust spectrum of outcome indicators.   Missing outcomes with a high return are going to give us an underestimate of program impact (e.g. understanding the labor supply responses of health interventions).   Second, when I work with program implementers to design an evaluation, they usually come up with a fairly long list of indicators they think the program might impact.     Third (and somewhat related to the first), as I’ve argued in a previous post, focusing on the outcomes within the sector for which the program was built (e.g. only looking at school enrollment impacts from conditional cash transfers) introduces a risk that we miss a program that is quite effective at addressing an “out-of-the box” outcome and perpetuates the silos of government and donor programming.  

Critiques 5 and 6 relate to getting evidence used effectively and efficiently.   Here, I think there’s a fertile ferment of new ideas coming up, but there’s a way to go yet.   And that’s a topic for a later post.   


Markus Goldstein

Lead Economist, Africa Gender Innovation Lab and Chief Economists Office

Neil Buddy Shah
February 12, 2016

Markus, thanks for this review of our paper as well as your initial inputs into it.
We agree with many of your points. Namely:
1. “Knowledge-focused evaluations” are extremely valuable and can inform policy directly and indirectly. That’s why we admire the trailblazing work done by organizations like JPAL, IPA, CEGA, DIME, and 3ie – organizations that improve social outcomes using knowledge generated via impact evaluation (and many of which sometimes do evaluations with more of a decision-focused bent as well).
2. There is often overlap between researcher and policymaker interests and decision-making criteria, even if they don’t always align perfectly.
3. It is often important to collect downstream indicators, and to collect data to detect unforeseen or indirect impacts.
There is no disagreement on the above.
But the question you pose (“What’s wrong with how we do impact evaluation?”) differs from the one we raised. We don’t think anything is wrong per se. But are we missing opportunities to use rigorous evaluation methods to more directly inform specific policy and programmatic decisions? When decision-maker constraints (budget, operations, time, politics, etc.) are real, how well do we work within those constraints? IE could inform far more policies than the evaluation community is currently able to if we could work within those constraints.
Oftentimes, decision-makers cannot delay a decision to wait for downstream indicators, pay for an extremely detailed survey, or enable the cleanest, randomly-assigned comparison group. In these cases, impact evaluation methodologies can still be applied to cost-effectively improve the evidence available for expected decisions. And shouldn’t we scrutinize impact evaluation as we would any other development intervention – how (cost-)effective is it at improving social outcomes?
We think we would benefit by being explicit on “why” any given evaluation is being conducted. If the primary purpose is to enhance our understanding of development or produce the most robust evidence possible for a new approach, then it makes sense to incorporate “knowledge-focused” characteristics such as collecting many indicators, long-term outcomes, and asking theoretically compelling questions. If the primary purpose is to inform a specific policy decision, then a decision-focused evaluation that maximizes rigor within decision-maker constraints is ideal. This is why the world needs more knowledge-focused evaluations AND more decision-focused evaluations. It’s not a zero-sum game.