Yesterday Martin Ravallion argued that  the fact that much of the impact evaluation taking place involves assessing the impact of specific projects one at a time is not that helpful in assessing development impact because it doesn’t tell us about the impact of overall portfolios if there are interactions between policies or if the subset of projects which get evaluated in an overall portfolio are not a representative sample. While there were many interesting and useful points in Martin’s post, I feel he undersells the benefits of single project evaluations.
Who cares about portfolios anyway?
Martin defines a portfolio as the various things financed by a developing country or as a set of externally financed projects potentially spanning multiple countries that may be held by a donor or the World Bank. So the entire set of projects that the World Bank carries out across the World constitutes one example of a portfolio. Is this really what we should be trying to evaluate? If your question is should the World Bank exist?, then yes. But I hope that on a regular basis the decision of whether to shut down my entire workplace or not is not the main topic of debate. Similarly, I doubt the debate in most developing countries is “Should we shut down our entire Government?” (I’ll grant you this does seem to feature in the U.S. debate!) or “Does it make sense for us to spend any money on education?”
Instead, the questions facing most decision-makers, be they in Government or the World Bank, are “what elements of our portfolio are working well, and which aren’t”, and “what projects should we include in our portfolio”. Single project evaluations are well suited to answering these questions – i.e. given everything else we are doing, does it make sense to do project X?
There is still pressure at the World Bank (and I’m sure at other donor agencies) to try and put a big number on the overall impact. But what I never see are attempts to put confidence intervals around these numbers- given the uncertainties and challenges Martin mentions, I am sure that in most cases these confidence intervals will be so wide as to make the whole exercise meaningless.
But what about interactions? Don’t they bias our results?
Martin notes that the success of one component of a portfolio may depend crucially on whether other components in the portfolio have worked, and as a result, adding up the single project evaluations will generally give a biased estimate of the overall portfolio effect. This is true, but I’ve argued we should rarely care about the overall portfolio effects anyway.
So think then about how interactions affect our estimates of the impact of a single project. Falk and Heckman have an interesting piece  which appeared in Science a couple of years ago about lab experiments. Adapting their argument, consider the impact of a project (X1) on an outcome (Y). They note that in general
Y = f(X1, X2, X3, …, XN)
where X2, X3, etc. are other factors which also affect the outcome. Then in general the impact of X1 will depend on the levels of all these other variables, unless Y is separable in X1.
What does this mean for our project evaluations? Well, in general, when we evaluate the causal impact of a program, it is always conditional on values of X2, X3, …, XN. The fact that the project happened to be interacted at the same time as another government policy was changing X3 is no different from the possibility that when the project was implemented business confidence, unemployment, prices, and a whole bunch of other X’s were all changing anyway. Regardless of the source of the variation in the other X’s, all we can measure is an average impact of changing our policy X1, conditional on whatever else is happening. Or if we are really interested in a particular X2 and X3, we can do multi-stage interventions where we also vary X2 and X3 and explicitly test for these interactions. But if we want to go further, we need either to (a) assume or show that X1 tends to have similar effects in a large number of relevant settings (i.e. separability from the other X’s), or (b) use theory and a model to attempt to transport the findings from one setting to another. This is the case regardless of whether or not a project is implemented by itself or as part of a portfolio.
I would think in general that the variation in other conditions generated by the other components of a portfolio is less, or certainly no greater, than the variation naturally occurring from other things going on in the economy. Thus, consider evaluating the impact of a policy that directs credit to SMEs, which is part of a private sector portfolio that also involves business climate reforms. We might expect an interaction between business climate reforms and how useful credit is to firms. But we also would think that business confidence, the exchange rate, whether the economy is growing, etc. etc. are also determining how much impact credit will have. There seems no reason to think that just because we put a label on something and call it a part of our portfolio that it will be more important than these other factors. One can of course think of exceptions – shock therapy policies or massive reforms after the fall of communism come to mind, but I don’t think this is the rule for the average portfolio of projects.
Don’t we get a biased sample of projects being evaluated?
I think Martin certainly makes a valid point here. We see in general that there are often more evaluations of projects run by small and nimble NGOs than by Governments, and often more evaluations of simple health and education interventions than of more complex policies. I certainly believe we need to recognize these knowledge gaps, and that our standard of assessing research should be relative to what we already know in this area. Nevertheless, I think also that we are so far from knowing well what works in so many sectors, that there is still plenty of scope for single policy evaluations to be informative and useful for policy.
Finally, as to the claim that current methodological fashion is leading to substitution away from evaluation of projects that are difficult to randomize, I’m not sure this where the substitution is happening. I may be too young to know, but I don’t recall seeing a whole host of sensible evaluations of say electricity or roads or industrial policies being done 10 years ago that are suddenly no longer getting produced – instead what I see is a massive surge in the number of people working on development research, and a movement away from some things (like cross-country growth regressions and descriptive work on household surveys) towards rigorous evaluations of a whole range of policies. I may well be wrong here, after all, evaluating the portfolio of research faces all the same challenges Martin raises!