Coauthored with Doug Parkerson
A couple of years ago, an influential paper in Science by Banerjee and coauthors looked at the impact of poverty graduation programs across 6 countries. At the time (and probably since) this was the largest effort to look at the same(ish) intervention in multiple contexts at once – arguably solving the replication problem and proving external validity in one fell swoop.
But if you are a policymaker and you picked up the article, you might wonder how this would best be adapted to the context you are working in. And, you could just say OK these are the X elements of a graduation program, now I am going to implement them. But, if as with most policymakers, you are not a moron, the article doesn’t help you figure out how to adapt the program to your context. There are some allusions to programming differences across contexts in the article (and indeed, how it failed in one case completely) but there isn’t much to go on to help you (especially given Science’s word count limit).
So how can researchers provide aggregate evidence in ways that serve the policy need (in ways that also often get them the publications they need)? Here are a couple of thoughts on different approaches.
1. Replication of an intervention with a serious discussion of programmatic differences, and a discussion of why those differences were chosen. This would be a multicountry study of a very similar (or same core) intervention, but with a clear list of what’s different across countries and why the implementers in those countries chose to do this (in this case all of the variation from the “ideal” program would be dictated by local circumstances). The contextual lessons aren’t going to be well identified (since each country will be not only a somewhat different program but a different context), but it will provide useful policy lessons --some discussion of why each modification was chosen would help policymakers who want to adopt and adapt this program in their country. The main concern here is that this additional discussion would likely be too long and (perhaps) not valued by a standard economics journal – so there is an open question around the incentives for this. Maybe the program folks could organize and write their own paper.
When would this approach make sense? When an intervention is already proven, with sizable effects, and is fairly straightforward in design, but there are big questions on external validity. In a way, the evolution of cash transfer evaluations, with the attendant discussions on programmatic differences, has achieved this (with no one ex ante design). As far as we know there are not any good examples of this and would be happy if anyone has any to share.
1.a. Try variants of the same program across countries on purpose. While the variation above happens because implementers in a given country want to do something different because of their contexts, this approach is about trying more than one variant of a program across countries. One example of this is the ongoing World Bank – IPA effort to test variants of the graduation program discussed in Banerjee et. al. Here the idea is to try the core graduation program with some additional components, but also to try it without some of the more expensive and/or harder to implement components across a range of countries in sub-Saharan Africa. As in the Banerjee, et. al. approach, there is a common core of indicators being measured in each country.
1.b. How well does it work if we fundamentally change the implementing organization? We often see the question: would we get the same great results if your NGO-implemented intervention were implemented by government? In this approach the goal would be to try as similar as possible interventions (as above) but to vary the implementing agency – ideally within countries, but with multiple countries participating. A choice of at what level of scale to evaluate could be another dimension here. This would help individual governments determine whether they might want to outsource implementation to NGOs (or vice versa). And adding up the lessons across countries (perchance with a process evaluation or two) would provide lessons for others who want to adopt. We don’t know of any examples of this.
2. Improve the plumbing. One extension of the approach above would be to take very, very similar interventions and experiment with the plumbing (as discussed in Alaka’s recent post). Here the idea would be to tackle a proven intervention (e.g., child immunization) and map out a concerted research program to figure out how to improve the efficacy of this intervention. Here the outcome measure would probably not be (at least in the first instance) a welfare/impact measure, but rather program reach (e.g., number of children vaccinated). As Alaka pointed out, these are likely to be cheaper, more rapid impact evaluations. We don’t know of any such concerted endeavor but would be happy to hear of examples. Note, there seems to be some of this happening in a grassroots fashion (e.g. experiments with different targeting criteria and periodicity for cash transfers), and this can be ex post pulled together in a literature review. But a concerted, intentional approach would get a bunch of key implementers together and identify what they think the main bottlenecks are.
3. Research that is driven by a common narrow research question. Here the approach is to take a common question and a set of similar, but clearly not identical, interventions and look at the impacts. One example of this is the Metaketa approach undertaken by EGAP. EGAP funds clusters of coordinated research studies on narrow questions. For example, the Metaketa on taxation is looking at the effect of providing information and subsidies in getting folks to register businesses (Brazil and Nigeria), land (DRC and Colombia) and for public services (Malawi and India). Measurement and design elements are closely coordinated across the studies to prepare for integrated meta analysis. Metaketas are structured to provide incentives for researchers to contribute to this collective effort. In addition to funding research into a common question, they also fund a second research arm determined by the individual teams of researchers.
4. Research driven by one broad question but is (fairly) agnostic on the intervention. Here the approach is to identify one key policy question and then see what can be done to address it. A common approach to the measurement of outcomes is key here since the interventions can vary quite widely. One example of this is the What Works initiative out of South Africa. It seeks to answer the question: What works to reduce violence against women and girls? And in their impact evaluation portfolio, they have standardized the measures of primary outcomes (and some secondary) across their projects. They are also standardizing other things that might matter for measurement such as the timing of follow-up surveys.
So these are some thoughts on adding up research. But just to be clear, we are not saying all research should be part of these kind of concerted efforts. After all, it’s the prospecting/proof of concept kind of research that gets us the innovative program to replicate in the first place.
A couple of years ago, an influential paper in Science by Banerjee and coauthors looked at the impact of poverty graduation programs across 6 countries. At the time (and probably since) this was the largest effort to look at the same(ish) intervention in multiple contexts at once – arguably solving the replication problem and proving external validity in one fell swoop.
But if you are a policymaker and you picked up the article, you might wonder how this would best be adapted to the context you are working in. And, you could just say OK these are the X elements of a graduation program, now I am going to implement them. But, if as with most policymakers, you are not a moron, the article doesn’t help you figure out how to adapt the program to your context. There are some allusions to programming differences across contexts in the article (and indeed, how it failed in one case completely) but there isn’t much to go on to help you (especially given Science’s word count limit).
So how can researchers provide aggregate evidence in ways that serve the policy need (in ways that also often get them the publications they need)? Here are a couple of thoughts on different approaches.
1. Replication of an intervention with a serious discussion of programmatic differences, and a discussion of why those differences were chosen. This would be a multicountry study of a very similar (or same core) intervention, but with a clear list of what’s different across countries and why the implementers in those countries chose to do this (in this case all of the variation from the “ideal” program would be dictated by local circumstances). The contextual lessons aren’t going to be well identified (since each country will be not only a somewhat different program but a different context), but it will provide useful policy lessons --some discussion of why each modification was chosen would help policymakers who want to adopt and adapt this program in their country. The main concern here is that this additional discussion would likely be too long and (perhaps) not valued by a standard economics journal – so there is an open question around the incentives for this. Maybe the program folks could organize and write their own paper.
When would this approach make sense? When an intervention is already proven, with sizable effects, and is fairly straightforward in design, but there are big questions on external validity. In a way, the evolution of cash transfer evaluations, with the attendant discussions on programmatic differences, has achieved this (with no one ex ante design). As far as we know there are not any good examples of this and would be happy if anyone has any to share.
1.a. Try variants of the same program across countries on purpose. While the variation above happens because implementers in a given country want to do something different because of their contexts, this approach is about trying more than one variant of a program across countries. One example of this is the ongoing World Bank – IPA effort to test variants of the graduation program discussed in Banerjee et. al. Here the idea is to try the core graduation program with some additional components, but also to try it without some of the more expensive and/or harder to implement components across a range of countries in sub-Saharan Africa. As in the Banerjee, et. al. approach, there is a common core of indicators being measured in each country.
1.b. How well does it work if we fundamentally change the implementing organization? We often see the question: would we get the same great results if your NGO-implemented intervention were implemented by government? In this approach the goal would be to try as similar as possible interventions (as above) but to vary the implementing agency – ideally within countries, but with multiple countries participating. A choice of at what level of scale to evaluate could be another dimension here. This would help individual governments determine whether they might want to outsource implementation to NGOs (or vice versa). And adding up the lessons across countries (perchance with a process evaluation or two) would provide lessons for others who want to adopt. We don’t know of any examples of this.
2. Improve the plumbing. One extension of the approach above would be to take very, very similar interventions and experiment with the plumbing (as discussed in Alaka’s recent post). Here the idea would be to tackle a proven intervention (e.g., child immunization) and map out a concerted research program to figure out how to improve the efficacy of this intervention. Here the outcome measure would probably not be (at least in the first instance) a welfare/impact measure, but rather program reach (e.g., number of children vaccinated). As Alaka pointed out, these are likely to be cheaper, more rapid impact evaluations. We don’t know of any such concerted endeavor but would be happy to hear of examples. Note, there seems to be some of this happening in a grassroots fashion (e.g. experiments with different targeting criteria and periodicity for cash transfers), and this can be ex post pulled together in a literature review. But a concerted, intentional approach would get a bunch of key implementers together and identify what they think the main bottlenecks are.
3. Research that is driven by a common narrow research question. Here the approach is to take a common question and a set of similar, but clearly not identical, interventions and look at the impacts. One example of this is the Metaketa approach undertaken by EGAP. EGAP funds clusters of coordinated research studies on narrow questions. For example, the Metaketa on taxation is looking at the effect of providing information and subsidies in getting folks to register businesses (Brazil and Nigeria), land (DRC and Colombia) and for public services (Malawi and India). Measurement and design elements are closely coordinated across the studies to prepare for integrated meta analysis. Metaketas are structured to provide incentives for researchers to contribute to this collective effort. In addition to funding research into a common question, they also fund a second research arm determined by the individual teams of researchers.
4. Research driven by one broad question but is (fairly) agnostic on the intervention. Here the approach is to identify one key policy question and then see what can be done to address it. A common approach to the measurement of outcomes is key here since the interventions can vary quite widely. One example of this is the What Works initiative out of South Africa. It seeks to answer the question: What works to reduce violence against women and girls? And in their impact evaluation portfolio, they have standardized the measures of primary outcomes (and some secondary) across their projects. They are also standardizing other things that might matter for measurement such as the timing of follow-up surveys.
So these are some thoughts on adding up research. But just to be clear, we are not saying all research should be part of these kind of concerted efforts. After all, it’s the prospecting/proof of concept kind of research that gets us the innovative program to replicate in the first place.
Join the Conversation