In March at Oxford, I had the opportunity to debate John McArthur on the Millennium Villages Project (MVP) evaluation, which is the subject of a paper  I co-authored with Michael Clemens of the Center for Global Development. The just-published newsletter (pdf) of Oxford’s Center for the Study of African Economies has a nice summary of the debate, and video from the event is here .
As the newsletter states, John and I “agreed that evaluating the MVP against counterfactuals was critical for the success of the MVP to be assessed, and agreed that before and after comparisons do not amount to rigorous evaluation.” These are the central arguments of our paper. This is also a welcome change from MVP’s initial impact evaluation plans, which Jeff Sachs and co-authors said in 2007  would be based on oxymoronic “rigorous before-and-after comparisons.”
Nonetheless, a gap in our views remains. In our paper, Michael and I criticized the Harvests of Development report —which John has called  “the first major scientific report on progress after three years of MVP activity”—for repeatedly describing changes from before and after comparisons as “impacts.” For example, the report describes the 27% decrease in chronic malnutrition of children under 2 as one of the “biggest impacts” of the MVP at the Ghana site. However, as we show in the paper, child malnutrition was falling at exactly the same rate in other rural areas of the region where the MVP site is located. This suggests that the improvement at the site was most likely not a causal impact of the MVP but rather the product of other forces which were improving child nutrition in the region overall. This analysis illustrates the perils of using before and after comparisons to measure impact.
John and his colleagues respond  that they use the word impact in an everyday sense for a general audience, and not in its technical meaning of change relative to an underlying control.
However, the only meaningful definition of the “impact” or “effect” of a policy or program is the difference between what actually happened, with the policy or the program in place, and what would have happened if the policy or program had not been implemented, i.e. in the “counterfactual.” I highlight this point because the discussion around our paper has taught me that many people perceive that impact evaluation advocates claim that it’s RCT-or-bust, i.e. that randomized controlled trial is the only valid form of evaluation. This is very far from my thinking, and I’m fairly sure even the so-called “randomistas” don’t hold this view. There are in fact many valid forms of impact evaluation. (See the excellent recent book Impact Evaluation in Practice  for a detailed introduction.)
In practice, policymakers have to make decisions about development projects and policies with a varied collection of evidence. The first step is understanding what we mean by “impact,” and that’s why it’s crucial to recognize that the right reference point is the counterfactual. This understanding is more important than the particular choice of evaluation method. I hope our paper continues to spur dialogue on what it means to conduct rigorous evaluation.