In defense of single project evaluations: A response to Ravallion


This page in:

Yesterday Martin Ravallion argued that the fact that much of the impact evaluation taking place involves assessing the impact of specific projects one at a time is not that helpful in assessing development impact because it doesn’t tell us about the impact of overall portfolios if there are interactions between policies or if the subset of projects which get evaluated in an overall portfolio are not a representative sample. While there were many interesting and useful points in Martin’s post, I feel he undersells the benefits of single project evaluations.

Who cares about portfolios anyway?

Martin defines a portfolio as the various things financed by a developing country or as a set of externally financed projects potentially spanning multiple countries that may be held by a donor or the World Bank. So the entire set of projects that the World Bank carries out across the World constitutes one example of a portfolio. Is this really what we should be trying to evaluate? If your question is should the World Bank exist?, then yes. But I hope that on a regular basis the decision of whether to shut down my entire workplace or not is not the main topic of debate. Similarly, I doubt the debate in most developing countries is “Should we shut down our entire Government?” (I’ll grant you this does seem to feature in the U.S. debate!) or “Does it make sense for us to spend any money on education?”

Instead, the questions facing most decision-makers, be they in Government or the World Bank, are “what elements of our portfolio are working well, and which aren’t”, and “what projects should we include in our portfolio”. Single project evaluations are well suited to answering these questions – i.e. given everything else we are doing, does it make sense to do project X?

There is still pressure at the World Bank (and I’m sure at other donor agencies) to try and put a big number on the overall impact. But what I never see are attempts to put confidence intervals around these numbers- given the uncertainties and challenges Martin mentions, I am sure that in most cases these confidence intervals will be so wide as to make the whole exercise meaningless.

But what about interactions? Don’t they bias our results?

Martin notes that the success of one component of a portfolio may depend crucially on whether other components in the portfolio have worked, and as a result, adding up the single project evaluations will generally give a biased estimate of the overall portfolio effect. This is true, but I’ve argued we should rarely care about the overall portfolio effects anyway.

So think then about how interactions affect our estimates of the impact of a single project. Falk and Heckman have an interesting piece which appeared in Science a couple of years ago about lab experiments. Adapting their argument, consider the impact of a project (X1) on an outcome (Y). They note that in general

Y = f(X1, X2, X3, …, XN)

where X2, X3, etc. are other factors which also affect the outcome. Then in general the impact of X1 will depend on the levels of all these other variables, unless Y is separable in X1.

What does this mean for our project evaluations? Well, in general, when we evaluate the causal impact of a program, it is always conditional on values of X2, X3, …, XN. The fact that the project happened to be interacted at the same time as another government policy was changing X3 is no different from the possibility that when the project was implemented business confidence, unemployment, prices, and a whole bunch of other X’s were all changing anyway. Regardless of the source of the variation in the other X’s, all we can measure is an average impact of changing our policy X1, conditional on whatever else is happening. Or if we are really interested in a particular X2 and X3, we can do multi-stage interventions where we also vary X2 and X3 and explicitly test for these interactions. But if we want to go further, we need either to (a) assume or show that X1 tends to have similar effects in a large number of relevant settings (i.e. separability from the other X’s), or (b) use theory and a model to attempt to transport the findings from one setting to another. This is the case regardless of whether or not a project is implemented by itself or as part of a portfolio.

I would think in general that the variation in other conditions generated by the other components of a portfolio is less, or certainly no greater, than the variation naturally occurring from other things going on in the economy. Thus, consider evaluating the impact of a policy that directs credit to SMEs, which is part of a private sector portfolio that also involves business climate reforms. We might expect an interaction between business climate reforms and how useful credit is to firms. But we also would think that business confidence, the exchange rate, whether the economy is growing, etc. etc. are also determining how much impact credit will have. There seems no reason to think that just because we put a label on something and call it a part of our portfolio that it will be more important than these other factors. One can of course think of exceptions – shock therapy policies or massive reforms after the fall of communism come to mind, but I don’t think this is the rule for the average portfolio of projects.

Don’t we get a biased sample of projects being evaluated?

I think Martin certainly makes a valid point here. We see in general that there are often more evaluations of projects run by small and nimble NGOs than by Governments, and often more evaluations of simple health and education interventions than of more complex policies. I certainly believe we need to recognize these knowledge gaps, and that our standard of assessing research should be relative to what we already know in this area. Nevertheless, I think also that we are so far from knowing well what works in so many sectors, that there is still plenty of scope for single policy evaluations to be informative and useful for policy.

Finally, as to the claim that current methodological fashion is leading to substitution away from evaluation of projects that are difficult to randomize, I’m not sure this where the substitution is happening. I may be too young to know, but I don’t recall seeing a whole host of sensible evaluations of say electricity or roads or industrial policies being done 10 years ago that are suddenly no longer getting produced – instead what I see is a massive surge in the number of people working on development research, and a movement away from some things (like cross-country growth regressions and descriptive work on household surveys) towards rigorous evaluations of a whole range of policies. I may well be wrong here, after all, evaluating the portfolio of research faces all the same challenges Martin raises!


David McKenzie

Lead Economist, Development Research Group, World Bank

Thomas de Hoop
May 26, 2011

I also do not think there is a substitution away from the evaluation of projects that are difficult to randomize. However, there might be a problem when policy makers would start substituting away from projects that are difficult to randomize in favour of simple projects, just because randomized evaluations have shown that these projects are relatively successful. This would be no problem when complicated projects are never successful, but I do not believe that anybody would agree on this point.

However, with the increased focus on rigorous impact evaluation, development agencies might actually be incentivized to switch to simple projects, because they are at least able to show the effect of these programs. When learning would be the main purpose of randomized evaluations this would be no problem, but policy makers might be more interested in accountability. This does not have to be bad, but it could result in perverse incentives. I think this is what Dutch NGOs are currently most afraid of.

Finally I would like to congratulate you with a wonderful start of a new blog. These discussions are great to follow, especially now that I am writing my chapter about learning from impact evaluations for my PhD thesis. I just hope you will not surprise me with a completely new argument after I hand in my thesis ;)

Martin Ravallion
May 26, 2011

David asks “who cares about portfolios anyway?” Be sure of one thing: poor people in developing countries care about portfolio impacts big time, because it is the impact of the portfolio of government spending and policies that determines their prospects of getting out of poverty.

I argued in my blog yesterday that if an exclusive focus on the impacts of selected individual projects (conditional on other projects, and the environment more generally) does not in fact add up to the impact of the relevant portfolio--either because of interaction effects or sample selection bias in what gets evaluated--then we will not know “development impact.”

I understand David’s perspective, as an individual researcher doing specific project evaluations. And I am not saying that such evaluations are a waste of time. Certainly not. But I contend that we must also take the portfolio perspective if we are to really learn about development impact. That is what the Aid Ministry in a donor country will want to know; if it cannot convince the country’s taxpayers that the aid budget has had development impact than they will not support more aid. The portfolio impact is also what the World Bank wants to know. If, for example, the Bank cannot credibly assess the overall impact of its IDA portfolio of concessional loans to one or more countries then it will naturally (and rightly) have a hard time raising money from IDA donors in the next round.

I do not contend that the problem of assessing the impact of the portfolio is ALL we care about. Rather I am saying it is important, and it is neglected now. At the same time, I also contend that other analytic tools will be needed for the task, in addition to standard impact evaluation methods. And there is development research to be done in coming up with those tools.

A re-balancing of our efforts is needed if we are to credibly assess development impact.

Ranil Dissanayake
May 27, 2011

One point about portfolio evaluation that you miss is that it allows us to look at the *balance* of our portfolios. We could do twelve excellent projects in one dimension of development work, but they may have a negligible impact on the overall experience of poor people because they are so focused on X that they miss the fact that Y became the constraining factor after 6 valueable projects in Y were initiated.

Secondly, it would actually be a great thing if all development agencies, including the World Bank did periodically ask, 'should we exist'. I've worked in development on the recipient country side for several years and believe that a number of development agencies should not exist. There are far too many, and many of them are simply taking money away from good organisations and using it less well.