“Just because it worked in Brazil doesn’t mean it will work in Burundi.” That’s true. And hopefully obvious. But some version of this critique continues to be leveled at researchers who carry out impact evaluations around the world. Institutions vary. Levels of education vary. Cultures vary. So no, an effective program to empower girls in Uganda might not be effective in Tanzania.
Of course, policymakers get this. As Markus Goldstein put it, “Policy makers are generally not morons. They are acutely aware of the contexts in which they operate and they generally don’t copy a program verbatim. Instead, they usually take lessons about what worked and how it worked and adapt them to their situation.”
In the latest Stanford Social Innovation Review, Mary Ann Bates and Rachel Glennerster from J-PAL propose a four-step strategy to help policy makers through that process of appropriate adaptation of results from one context to another.
- external validity
External validity is a recurring concern in impact evaluation: How applicable is what I learn in Benin or in Pakistan to some other country? There are a host of important technical issues around external validity, but at some level, policy makers and technocrats in Country A examine the evidence from Country B and think about how likely it is to apply in Country A. But how likely are they to consider the evidence from Country B in the first place?
Chris Blattman posted an excellent (and surprisingly viral) post yesterday with the title “why I worry experimental social science is headed in the wrong direction”. I wanted to share my thoughts on his predictions.
“Take experiments. Every year the technical bar gets raised. Some days my field feels like an arms race to make each experiment more thorough and technically impressive, with more and more attention to formal theories, structural models, pre-analysis plans, and (most recently) multiple hypothesis testing. The list goes on. In part we push because want to do better work. Plus, how else to get published in the best places and earn the respect of your peers?
It seems to me that all of this is pushing social scientists to produce better quality experiments and more accurate answers. But it’s also raising the size and cost and time of any one experiment.
No thoughtful technocrat would copy a program in every detail for a given context in her or his country. That's because they know (among other things) that economics is not a science but a social (or dismal even) science, and so replication in the fashion of chemistry isn't an option. For economics, external validity in the strict scientific sense is a mirage.
This is the first in our series of posts by students on the job market this year.
Impact evaluations are often used to justify policy, yet there is reason to suspect that the results of a particular intervention will vary across different contexts. The extent to which results vary has been a very contentious question (e.g. Deaton 2010; Bold et al. 2013; Pritchett and Sandefur 2014), and in my job market paper I address it using a large, unique data set of impact evaluation results.
I gathered these data through AidGrade, a non-profit research organization I founded in 2012 that collects data from academic studies in the process of conducting meta-analyses. Data from meta-analyses are the ideal data with which to answer the generalizability question, as they are designed to synthesize the literature on a topic, involving a lengthy search and screening process. The data set currently comprises 20 types of interventions, such as conditional cash transfers (CCTs) and deworming programs, gathered in the same way, double-coded and reconciled by a third coder. There are presently about 600 papers in the database, including both randomized controlled trials and studies using quasi-experimental methods, as well as both published and working papers. Last year, I wrote a blog post for Development Impact based on this data, discussing what isn't reported in impact evaluations.