News that another $72 million has been committed for a second stage of the Millennium Villages Project (MVP) has led to another round of critical discussion as to what can be learning from this entire endeavor. The Guardian’s Poverty Matters blog and Lawrence Haddad at the Development Horizons blog offer some critiques. In response to the latter, Jeffrey Sachs and Prabhjot Singh offer a rather stunning reply which seems worth discussing from the point of view of what is possible with impact evaluations. So let me dissect some of their statements:
“The simplistic idea, moreover, that one can randomize villages like one randomizes individuals, is extraordinarily misguided for countless reasons. The most obvious, but not even the most important, is cost. To do a controlled experiment (on a single intervention) with thousands of individuals is possible; to do a controlled experiment at the scale of 30,000-person communities is far beyond the project budget (or any budget for similar activities).”
Comments: 1) it is possible to randomize villages, there is just less power in doing so than if one randomizes an intervention at the individual level. But for a large transformative intervention like the MVP project, it should presumably be aiming for a large effect – in which case it may be possible to detect it with even 14 treatment villages like the MVP seems to already have. See for example Jed’s post on evaluating changes in supply chain management for medical drugs in Zambia. So to make a credible argument that the village sample is too small for evaluation, we need to see some power calculations. 2) it is unclear what is meant by cost being the most obvious reason here. The cost of including control villages is only the cost of surveying them, which has to be cheap relative to the millions being spent on the program.
“The logic is also flawed. In a single-intervention study at the individual level (e.g. for a new medicine) one can have true controls (one group gets the medicine, the other gets a placebo or some other medicine). With communities, there are no true controls. Life changes everywhere, in the MVs and outside of them.”
Comment: This is just a baffling comment. The whole reason for having controls is that life changes everywhere – if it didn’t, before-after analysis would be just find. The purpose of having these similar control communities is precisely to control for all the other stuff going on in these countries which could be causing changes in the Millennium Villages regardless of the impacts of the MVP. The work by Clemens and Demombynes critiquing the earliest claims of the MVP’s impacts showed clearly some of the massive changes occurring in Africa in indicators such as cellphone ownership that clearly render before-after analysis misleading.
“A third reason is even more important. Introducing community-based capital involves extensive local participation, design, and learning by doing. This includes the methods that communities improve over time with us to measure their own progress, leading to a sustainable monitoring and evaluation strategy that is part of the data-driven local management. There is no simple MV Project blueprint, though there is an overarching strategy. The logic of simple randomized trials does not apply in a context of design with extensive learning by doing, where the main goal is to develop new tools and systems that are replicable and scalable and used by the community itself”
Comment: This is a common misconception. Dean Karlan has a nice paper in the Journal of Development Effectiveness (ungated version here) which argues that randomized trials can be used to evaluate complex and dynamic processes, not just simple and static interventions. There is nothing fundamental to randomized trials (or to non-experimental methods of impact evaluation with control groups) that prevents analysis of a designs that have extensive learning by doing.
The people in the Millennium Villages don’t toil to win intellectual points. They are creating improved systems of service delivery within their communities that will have a lasting impact. Those systems can be rigorously documented, captured in ICT tools, and rendered replicable and scalable.
Comment: this confuses process evaluation with impact evaluation. Documentation and measurement of what is actually being done in the millennium villages is a crucial part of process evaluation, and no one is arguing that this shouldn’t be done. But this still tells us nothing about the impact of delivering those services – we can learn (I’m making up numbers here) that 50,000 malaria nets were given out, 20,000 of them were used, and that only 100 cases of malaria were observed in the 14 villages last year, but this doesn’t tell us what the impact is without knowing what would have happened without this intervention.
“There has been much naïve talk about paired “comparison” villages. The Millennium Villages Project actually has them, though we introduced them in year 3 rather than year 1, because in year 1 the considerable work required to create a foundation of community-driven strategies in the context of a very complex project took precedent. We knew from the start that there would be many complexities in comparison sites and we began to introduce them only when the project was functioning in all sites. For anyone who has taken the time to understand the difference in pace of initiation, organizational culture and preexisting capacity between the varied settings of the Millennium Villages will know that a “Year 1” comparison would be meaningless.”
Comment: there seems to be two parts to this argument: 1) that they were too busy setting up the project to set up control sites – but this is precisely the time to think about control sites- as one narrows down the list of feasible locations, it should be relatively easy to choose comparison locations at the same time; and 2) that there were “many complexities” in the comparison sites – I have no clue what this means.
“They also don’t understand the deep limitations of the particular analytical tool of comparison sites. Yes, comparison sites are being monitored and used in the monitoring and evaluation, but they should not be overrated. They will add surprisingly little true insight into the project and its achievements. Here’s why. We already have a natural comparison, and that is what is happening to the MDGs in the district and country as a whole compared with the MVs. Spending a great deal of time and personnel on one particular “comparison site” is misplaced concreteness. The district and national data are basically free, collectible, and a good standard of comparison, while any other single comparison site is somewhat arbitrary and a noisy comparison. The comparison village is definitely not a “control” village in the sense of a real, unchanged control, nor could it be…No place is standing still to be a control site … If the comparison site happens to get a new road or an extension of the power grid, this gives an artificial “small sample” error to the comparison with the MV.”
Comments: this completely ignores all issues with deliberate selection of the MV villages meaning they are not directly comparable to other villages in the district and the country as a whole. It also limits assessment of impacts to the rather crude set of indicators that are collected in national data, whereas the MVP aims to do a whole lot of transformation within communities that one would surely like to see the effects of. The final points here repeat the confusion that a control village need to stand still – it does not; and that the issue of small samples – which power calculations can tell us about.
“Moreover, many of the key lessons derived from the MVs are already being taken on board in neighboring villages, and even at national scale. The community health worker programs, e-health, and community-based malaria control are good examples of rapid diffusion from the MVs. When the comparison villages make progress, some of that progress is a spillover from the MVP itself.”
Comments: this seems to be the most valid concern expressed in this article, and is a common issue to think about in impact evaluation – are there spillovers to other communities. In principle one could try and measure these spillovers – there is no reason to have only one control village per treatment village, and one might be able to measure spillovers on nearby villages while still using similar, but not quite so near, villages as controls. This would be a more complicated design and have further implications for power. Nevertheless, the potential for spillovers is a reason to try and design an evaluation which measures these to the extent possible, not to abandon doing an impact evaluation altogether.
….” The progress towards achievement of the MDGs, within the MVs and by example beyond the Millennium Villages, is the true measure of success.”
Comments: surely the measure of success has to involve how much the MVP project contributes to progress towards achieving the Millennium Development Goals, not whether these goals are achieved or not – they could be achieved or not achieved for reasons completely beyond the control of the project.
Now there are valid reasons to debate which methodology is best for evaluating the impact of the MVP, and serious discussion of the components that should factor into this decision seem worthwhile. But fallacious statements such as those made in this post by Sachs and Singh do nothing to further the debate nor to encourage others considering large-scale interventions to seriously invest in rigorous impact evaluation.
Finally, one must also question what donors like the Soros Foundation and the UN relied on in terms of evidence when deciding to fund this second phase of the MVP project. Either donors are happy to fund such a program based on factors other than empirical evidence, or arguments like those above are misleading decision-making.