What will be the next "victim" of randomized control trials?

|

This page in:

Randomized control trials (RCTs) have been grabbing a lot of headlines lately. Esther Duflo, the principal champion of RCTs and recipient of the John Bates Clark medal, recently had a profile in The New Yorker (gated) and also gave an entertaining TED Talk. As seems to usually be the case with innovations in economics, the discourse has quickly gotten pugilistic.

One of the first punches thrown was by Angus Deaton, a professor of development economics at Princeton. Back in 2008 he gave a presentation with a slide showing Esther Duflo and one of her colleagues jumping out of an airplane, Duflo with a parachute and her colleague without. Deaton was making the point that we don't really need an RCT to know that a parachute is necessary. Duflo has proven to be equally good at landing a punch or two. In her TED Talk, she compared development economics before the advent of RCTs to medieval doctors using leeches. And in her New Yorker profile, she quickly dismisses an argument made in favor of microfinance that sees the fact that the poor are eager to take out microcredit as a positive sign. This is, according to Duflo, "the moronic revealed-preference argument."

Leaving aside for a second the arguments for and against RCTs, I want to ask why it is this particular moment in time that RCTs have managed to get so much attention. It is not as though they are entirely new -- RCTs have been used for around a decade to assess a number of development interventions, e.g. deworming, bednets, female electoral quotas, etc. My guess is that it has a lot to do with the recent and widely publicized RCTs of microfinance, in particular an evaluation of an Indian microfinance outfit last year that found little positive impact on the recipients of microcredit.

The difference between the microfinance RCT and all the others? The RCTs of deworming, etc. all found that these interventions were successful and deserving of continued government and/or aid support. It was only in the case of microfinance that the RCT found little evidence of development impact -- and this prompted a minor public relations scandal, with a number of prominent MFIs issuing a less-than-satisfactory rebuttal.

Microfinance will surely not be the last "victim" of RCTs. Duflo and friends get a lot more publicity from debunking interventions than from showing what works (though this is of course a function of the press and not Duflo et al.). It is only a matter of time before a new RCT finds another development intervention that doesn't match the hype. I expect the next "victim" will have the following characteristics:

  • Highly optimistic -- and verifiable -- claims will have been made on behalf of the intervention (see the claims Duflo et al. target in their paper on the microfinance RCT).
  • The intervention will have a strong theoretical appeal (e.g. the appeal of microfinance in the way that it is supposed to handle problems of adverse selection through group lending.)
  • It will have institutionalized supporters that see the results of the RCT as threatening to their work.

If I had to wager a guess, I would bet on the One Laptop Per Child program as the next "victim". (I await eagerly the results of an ongoing evaluation at the IADB). But there must be many evaluations going on out there I am not aware of. Any suggestions from readers?

Update: In addition to the limitations pointed out by Eric and Helen in the comments and the post I point to by Bill Easterly, there is another methodological problem with RCTs that just occurred to me. At least in the cases I've seen, the selection of partner organization (Spandana, in the case of the Indian microfinance RCT) itself is not random. And this introduces selection bias into the process. 

The New Yorker article discusses how Duflo et al. had to work to find willing partner organizations ("...In 2005, after a lengthy search in an industry wary of subjecting itself to this kind of scrutiny...") It seems highly unlikely that those MFIs that have agreed to undergo an RCT are representative of all MFIs. This would seem to severely limit our ability to generalize from the case of Spandana to all MFIs. I suspect someone else has probably already made this point, but I haven't seen it anywhere before.

Eric Swanson
June 18, 2010

A flaw in naive RCTs of social phenomenon is that neither the subject nor the investigator are anonymous. Thus RCTs fail to conform to the standards of a proper, double-blind experiment. Especially when the evaluation process involves some degree of subjective assessment, randomization may merely be a cover for deeper design flaws. Which is not to say that there is nothing to be learned from RCTs or pseudo-RCT (statistical matching and so forth) procedures. But they hardly constitute a "gold standard" for social science research.

Alberto Cottica
June 18, 2010

Ryan, it seems to me that you are being less direct than usual. Are you suggesting that RCT researchers have a vested interest in intervention busting? And if so, do you think RCT research is trustworthy nevertheless?

Helen Abadzi
June 19, 2010

There has been similarly much ado in education about randomized control trials. For example an earlier experiment found that textbooks in Kenya had no effect on learning. That should help save money on textbooks, but the research had no measures of literacy, knowledge of official language, or instructional time. If the average students were illiterate and knew no English (because they got very little teaching), should they also be deprived of textbooks? It turned out that those who knew English and could read did learn.

More important than the randomization is the chain of the intervening variables. Drug manufacturers do randomized control trials late in the research, only after they have clarified the chains of causality. These matter regardless of outcomes. If the RTC shows an effect in country X, how do you transplant it in country Y? There may be higher-order interactions that will simply not work. But if we don't identify them, we risk money wastage and disappointment.

The Bank unfortunately is very weak in establishing chains of causality, since most staff are generalists. Hopefully, the new strategy on technical knowledge will bring some people on board with specialist knowledge who will know which chains of causality to explore in various sectors.

GS RADJOU
June 20, 2010

Myself, I found these RCTs useful as long as they help to remove biases. My methaphorism is with flight tests that were used with the Island volcano eruptions where ashes in the atmosphere congested air road trafics or no flight zones at all (because of these precautionary measures). We had had 2 previous incident records in another time and another sky...So, flight tests help to lift uncertainties and cut on costs. At the end, what I expect from any financing with RCT is that the money borrowed help to leveraging (boost development and get good results) the organisation in search of that money (either small or big). The advantages between financing a business and bartering for goods is that in principle the organization would not wait for its needs (for products and services) if the RCT is good.

Henk J.Th. van Stokkom
June 20, 2010

Evaluation of the Millennium Villages Project?

Nick gogerty
June 21, 2010

I for one encourage RCT as much as possible. feedback and critique are good. Even if one debates the nature of the process or structure of RCT, there is at least a formalized conversation about process and outcomes.

Ideally the RCT would be the norm and not the exception and there would be maturity model of RCT's which would mean a lively debate about outcomes and real impact.

the more structured enquiry and debate the better. Applying various economic "cures" to patients (groups) around the world will be more effective for it.

I agree that RCT and public uncertainty will increase as some projects are shown to be less than effective, but in the long run, hopefully those in need will benefit and those giving will have higher impact.

Bring on the debate, the thought and passion about outcome based and measured aid.

Ryan Hahn
June 21, 2010

@Alberto

As far as the good intentions of RCT researchers, I don't doubt them at all. Everything I have read and seen makes me believe that they are scrupulous in the design of their research and have a genuine desire to make sure public money goes to its best use. So, from this perspective, I think the work is trustworthy.

However, there is a separate question which has to do with the inherent methodological limitations of the approach. Erik and Helen have both brought up good points in the comments. Bill Easterly also brings up good points in the post I point to ("the arguments for and against RCTs"). I would summarize all of the limitations of RCTs under the general heading of "naive empiricism". Duflo et al. seem to believe that we simply need to have really good data to decide what to do with limited aid/gov't budgets. Unfortunately, good data are never enough - we need some theory to be able to understand what these good data are telling us. (E.g., if microfinance doesn't work in the urban slums of India, does that mean it also won't work in India's countryside? Should we no longer fund textbooks, or are there other steps that need to be taken to make the textbooks useful?)

One final point. I have noticed that we tend to pay more attention to debunking of interventions rather than "what works". This is of course anecdotal, but it seems like an ingrained trait of humans (or perhaps just a function of the media?). I am afraid aid funders may ignore the "what works" part of the argument and just hear the "debunked" side. But both sides are of equal importance.

Steve
June 21, 2010

Abstinence-only and other AIDS prevention methods were already victims (Thornton and Duflo studied this). Giving out sanitary napkins to girls in an effort to promote education was a victim (Oster and Thornton wrote in the NYTimes).

Social marketing of anti-malarial bed-nets was a major victim because they found that (due to positival externalities of owning a net) handing out the nets is a cheaper way to prevent malaria! (Cohen and Dupas did the studies.)

Ryan Hahn
June 21, 2010

@Steve

Thanks for pointing me to these additional studies - I was not aware of the RCTs on AIDS prevention or sanitary napkins.

Scott Guggenheim
July 23, 2010

I've worked closely with several of the JPAL people to carry out randomized evaluations of our big community development programs in Indonesia and Afghanistan over the years. These evaluations have been extremely useful for improving our programs, and they've done it through findings that were both counterintuitive and which almost certainly could not have been obtained any other way. To cite one example, we did a big, RCT study on what reduces corruption in community programs. Whereas my entire team thought that increasing participation and transparency would be most effective, in actual fact increasing the frequency of locally publicized audits had far greater effects. That finding has now translated into a revised audit policy and procedure for $1.7 billion/yr in CDD investments. Surely this sort of work is both constructive and useful for development programs. Other RCTs are looking at what can improve community-state interactions in Afghanistan, MDG performance in Indonesia's poorest villages, and what incentives will lower police extortion. Again, surely these are positive and relevant -- and where it's important to be sure that we've nailed the right factors that explain them before we urge governments to translate them into national policies.

The critics are undoubtedly right that there's a certain amount of grandstanding in debunking development shibboleths. But that just means that academics respond to academic incentives. The question for us is whether their work is useful and, for this debate, whether it's useful enough to justify the time and cost. The commentator who said that a lot of the problem is that the Bank all too often lacks a causal theory to test hit the nail right on the head -- there's no point doing this (or any other) experimental evaluation if you don't have a causal model behind it. I think that is true for development projects overall, though, and one reason why so many Bank programs produce so few measurable impacts is because they depend on devleopment's standard assumptions being right. Having RCTs challenge those is surely a good thing.

Finally, just to throw a bit more fat on the fire, one reason I've found the JPAL crowd so much better to work with than many of the critics, incuding Bank evaluations teams, is because they combine all of that high powered theory and method with a very strong field orientation. It's not just a matter of hiring consultants. By the time JPAL finished the corruption evaluation, they'd spent more time in the field and understood how the program was designed and operated better than virtually anyone in the Bank did. That field engagement is actually part of an overall epistemology and not just a useful add-on is a point that is increasingly alien to both the Bank and to the economics discipline. But the cost of poor field knowledge is discrediting modern economics in fields that range from the current subprime mortgage fiasco to understanding poverty and inequality. That Duflo, Olken, Glennester, Beath and the others reject this route already makes them more useful to development than a lot of what is being proposed by way of alternative. In short, while some of the triumphalism and rigidity of the RCT crowd can be annoying, surely we have few enough ways to rigorously test and assess development knowledge that we should be thrilled to have this new group of partners challenging a lot of our received wisdom.

Vijayendra Rao
July 26, 2010

To supplement Scott's admirable defence of RCTs: I think it is a little silly to argue that RCTs are never useful. Or that they have not added value to development. They are unquestionably a valuable tool. But, I wish that the development community would pay as much attention to studies that show no impact as they do to those that do.

Note, however, RCTs have other serious limitations. Perhaps their biggest limitation is that they tell us little about why a result was obtained. RCTs focus entirely on "how much": on the size of the impact (or the lack thereof). Consequently, the lessons for policy are limited. The whys require serious attention to process - which takes a little more than just going to the field. It needs serious qualitative analysis - participant observation, unstructured interviewing, historical tracking (something that - I should note - Scott has also encouraged in KDP). This is not a priority for RCT advocates - and it should be. It is also not a priority for the Bank, and in some cases is seen by project staff as threatening because good qualitative work can uncover some pretty unsavory realities.