Published on Development Impact

One evaluation, one paper? Getting more for your money

This page in:

Development economists are spending hundreds of thousands of dollars and many hours of their time designing, implementing, and analyzing the impact of various interventions. If all goes well, in many cases this leads to one really nice paper. But should it just be “one experiment one paper” as I have heard that one journal editor argues?

I think with all the effort conducting an ex ante impact evaluation requires, it should be the norm, rather than the exception, to try and get more than one research paper out of all this effort. There are a number of possibilities for doing so, and I would urge researchers planning impact evaluations to see whether they can leverage their basic evaluation for more by doing one or more of the following:

·         Conducting methodological work to improve our knowledge of survey design, how to ask particular questions, etc. Sometimes this can be a natural and needed part of the evaluation itself. For example, in conducting an evaluation on microenterprises, it was natural for us to also look at how to measure profits; and in Berk’s work on conditional cash transfers, it likewise was important to look at the accuracy of self-reported schooling data. But one can also build in other interesting modules or methodological experiments on top of an existing survey that are not essential for the evaluation itself, but piggyback on it – as we did with measurement of expectations of migrants in an evaluation of migration impacts in a lottery program. Such papers are very useful to specialized audiences, but because they are the sorts of papers that are usually suited for field journals, the incentives for academic researchers to produce them are not as high as they perhaps should be.

·         Cross-randomizations – this is a reasonably popular approach, whereby researchers cross-randomize a second (or third) intervention on top of an existing one – think Pascaline Dupa’s work on Sugar Daddies which was implemented in schools which were also part of an evaluation of an HIV prevention curriculum, or Dean Yang and co-authors layering an experiment which offered discounts to remitters on top of one which offered bank accounts with different degrees of control to remitters. Duflo et al discuss the use of these cross-cutting designs in their randomization toolkit and note this have been a cheap way to allow graduate students in particular to implement experiments as part of larger projects, and that they can offer the possibility of comparing the effectiveness of several different treatments. In theory they also offer the possibility to test for complementarities between interventions, but my sense is that few such studies are adequately powered to detect these complementarities - so I’ll save for another day further discussion of the drawbacks of doing too much of this.

·         Should impacts in different domains require different papers? One approach is to try and cram all the results of an intervention into one paper. This works well for reasonably straight-forward interventions with unsurprising results – but can get unwieldy when the intervention has complex effects in many domains – for example, Casey et al look at the impact of a CDD program on 318 different outcomes. The result is that such papers tend to have massive appendices, and researchers cannot spend a lot of time in the paper trying to look into the reasons underlying some of the impacts. Thus while the Katz, Kling and Liebman analysis of the moving to opportunity program gets all the results into one paper in Econometrica, it comes with a 37 page appendix which has much of the interesting analysis. An alternative is to write more narrow papers which go into more detail on particular outcomes, allowing the why of their effects to be explored more fully in addition to the what – so for example we see a range of papers on Progresa/Oportunidades which look at impacts on different types of outcomes – while in my own work on Tongan migration to New Zealand through a migration lottery we have written separate papers on the impacts on family members left behind, the impacts on the incomes of the migrants, and then more specific papers on impacts on mental health, child health, and several other aspects. Given that half the battle in most empirical work is finding a decent source of identification, if we are really interesting in the impact of X on outcomes Z1, Z2, and Z3, and these outcomes are really in relatively separate strands of the literature – then it may make sense to engage in more detail with each literature in separate papers, rather than bundling everything together into an omnibus paper.

·         Descriptive policy papers: the baseline data and questions surrounding an intervention may at times be the only data in the world that can help answer pressing policy questions in a particular policy context. Writing this up as a short policy paper may therefore be very useful. Indeed, perhaps the research of mine which I can claim most direct policy effect from is a paper I wrote with John Gibson and Halahingano Rohorua for the Pacific Economic Bulletin, which used data from our evaluation of the Tongan lottery program to do some pretty simple descriptive work on the costs of sending money in the Pacific. This spurred an effort by the New Zealand Government and World Bank operational team to work to reduce the cost of sending remittances, leading to a change in a law in New Zealand that had limited one option for sending money cheaply and the development of a website to share information on the costs of sending money. Likewise in other interventions the Government or implementing partners may be just as interested in insights from the baseline data as they are in the results of the intervention itself. In some cases such papers can be publishable as well as being used as reports.

·         Using the data for other papers: despite more of the underlying data from experiments becoming available, to date there have not been that many papers which use data from an experiment for other purposes. That is why I’m excited to see more of such papers appearing – examples include Gharad Bryan’s job market paper from last year which looks at ambiguity aversion with two experiments; work by Maitreesh Ghatak and Tim Besley which uses my Sri Lanka experiment to calibrate their theoretical model; and a couple of other such examples. Apart from making data and questionnaires more readily available, which several initiatives are striving to do, I think the other thing is to get researchers thinking more about whether there is a treasure trove of data from all these experiments that are underexplored relative to the big LSMS style datasets. While the settings are often not nationally representative and the samples smaller, typically the data contain a couple of rounds of a panel at least, relatively rich questions on decision-making, and at least one source of identification.

Finally, I should note that a non-trivial percentage of all impact evaluations fail – either by not being implemented, or by having such low take-up that there is no power to detect any impacts, or a myriad of other things that can go wrong – and so having built-in some other interesting work that can be done with the same dataset is a good back-up plan for those devoting significant chunks of their time to this type of work.

Readers, do you have any other good suggestions for people thinking of a plan B for how to better leverage all the work they are doing in putting an impact evaluation together?


Authors

David McKenzie

Lead Economist, Development Research Group, World Bank

Join the Conversation

The content of this field is kept private and will not be shown publicly
Remaining characters: 1000