In the last two days, we have had posts calling for more evaluation of portfolios  and in defense of evaluation of single projects  by such heavyweights as Ravallion and McKenzie, respectively. While this is an important debate, as part of the nature of debating, each side ends up painting a picture of the other that is somewhat skewed or exaggerated. This, in turn, may leave the readers with the wrong impressions. I want to make the simple point that the dichotomy in this debate is not as black and white as it looks, and that David and Martin aren’t actually far apart from each other: rather, it looks to me like they’re just excited about and interested in different things at the respective stages of their professional careers.
I think most people would agree with Martin’s basic point that we need more than just RCTs, we need more than just economists and medical practitioners acting like mechanics to solve complex development problems, such as poverty traps. Duflo and Banerjee may well be right that solving the big puzzle one small piece at a time may move us significantly in the right direction, but it is not obvious that it can to get us close to the end-line. That the questions and the settings for RCTs are selected non-randomly does not help. We need people using other methods, we need people from other disciplines, and we need to be patient for change to occur.
On the other hand, there is little doubt that there are more RCTs today in economics is a good thing. First, prior to reliable exogenous variation in treatment in economics (or in political science, etc.), we either had bad conditional correlation studies or had good ones, about which we spent a disproportionate amount of time worrying about identification. But, economists are not different than anyone else: we have a limited amount of time to worry about each subject. So, if I have to worry about, say, (a) identification (i.e. whether the answer I am providing as causal is correct), (b) measurement, (c) external validity, (d) spillover effects, the more I have to spend on (a), the less I have to spend on (b) through (d). This, I think, is true for researchers, reviewers, editors, and readers. I think I spent way too many hours in a windowless seminar room listening to crazy stories about why an instrument was valid or not. If I am going to be in that room, I’d rather know that the answer is correct (i.e. the pill does cure the illness under lab conditions) and then spend the next 90 minutes on whether the outcomes are measured properly (next big issue in development economics IMHO), whether the findings are generalizable, and what are the likely general equilibrium effects. That is a more fruitful (and pleasant) way to spend our time.
Furthermore, it is not true that elements of a ‘portfolio’ as described by Martin, cannot be subjected to an RCT. What he described in the box is a standard 2x2 cross-cutting RCT design. You can be more ambitions and have, say, a 4x2x2 design if you can manage the large sample size required. Ranil Dissanayake of Aid Thoughts  made the point in his comments that 12 interventions on X may not work unless there are some interventions on Y. Another way to think about this is background heterogeneity, say on the supply-side for demand-side interventions. Not only I can experiment with multiple demand-side interventions, but I can block the whole thing by supply-side variation, so that I can assess whether CCTs work when school are good or not, or when the education system is decentralized or not. I can have simultaneous interventions on the supply side that are coordinated. As Jed Friedman  showed us in an earlier post , you can show impacts of systems interventions even with a limited number of treatment units (say districts, etc.) if the impact is large enough and you prepare yourself well during the design phase.
Martin may retort that these are easier in the education and health sectors than some other sectors, and he would be right. But (a) human capital accumulation is very important, and (b) even in some of these other sectors we can devise randomized evaluations. For almost a decade no one could pull of an RCT on microfinance, until Banerjee et. al. did. Are roads, water, or electricity projects that much harder? Political scientists are already well on their way with experiments. We can try to get at poverty traps experimentally. Of course, not getting carried away with randomization and complementing these with theory and other methods of evaluation are important and are likely to get us much further, but still the distinction is not as stark as it is made out to be.
One of the biggest advantages of RCTs is the ease of dissemination to policymakers. The joke about the economist used to be that “One the one hand, I think X, but on the other hand I suggest Y.” With RCTs, that is no longer true: however narrow a question I have answered, I can at least say: This is what happened and the probability that it happened by chance is small. Now, let’s talk about concerns of measurement, generalizability, and scale-up.”
And, if I am a conscientious researcher who cares equally about internal and external validity, I have actually already designed my study for that exact moment with the policymaker, who is charged with scaling up: I have designed the study to answer pertinent policy questions, I have looked at a representative sample of the target population, I have included a spillover design that reduces the worries about SUTVA violations and tells me something about going from partial to general equilibrium effects, etc. No, I won’t be able to answer all the questions that the technocrat would like to know about, but I’ll be prepared. And, if I am working with willing partners, maybe we can answer some of the remaining questions with another study during the staggered scale-up.
The debate between Martin and David also missed a point about development: what we’re trying to do is not about having projects for poor people brought to them by their government or the World Bank, or an NGO. It is to enable many to just leave poverty and to have the comfort of a basic safety net: we want asset accumulation and behavior change. And, for the latter, not just for the poor but for everyone. So, RCTs that can tell anyone the benefits of how to stimulate your child or to take a nutritional supplement can be effective for behavior change. Figuring out why people don’t treat themselves for malaria even though it would be privately optimal  is important for exactly those people to change their behavior in the future. I think RCTs make this easier than other methods.
In the end, I think Martin’s worry is that there are currently too many “David”s and not enough “Martin”s. I don’t have the quantitative evidence to state with confidence that this is true. But, let’s suppose it is. The question becomes what to do about it and Martin’s suggestions at the end of his post were my favorite part of the whole thing. We’ve had huge growth in the “David” sector over the past decade and, by all means, let’s try to figure out how to get growth in the “Martin” sector as well.
But, in the end, this is not a debate about which method is better: let’s agree that both are needed and see whether there is a failure that causes a sub-optimal level of one or the other. My proposed solution in the meantime is the obvious one that I have personally witnessed to work:
My colleague and co-author on a bunch of recent papers, Craig McIntosh , is working with me and Martin on an experiment in Tanzania on poverty traps. Craig, and he would fully agree with this revelation, is very concerned about internal validity in all of his work. Martin is publicly very concerned about external validity (Me, I enjoy working with both of them immensely). You should have seen the two hours spent around Martin’s round table while we were designing this particular study: the tug of war between two smart researchers produced, I will claim, a really nice study design that led to a very nice compromise. Either of them, left to their own devices, would have come up with a different study design: but what we have now is better than either of those latent designs.
So, go ahead and partner up with a World Banker, a biomedical expert, a political scientist, an anthropologist, or a qualitative methods specialist: someone not like you. Go big. Your study will be better for it.
- Impact evaluation