Syndicate content

Sanity in the Great Methodology Debate

David McKenzie's picture

The increased use of randomized experiments in development economics has its enthusiastic champions and its vociferous critics. However, much of the argument seems to be battling against straw men, with at times an unwillingness to concede that the other side has a point. During our surveys of assistant professors and Ph.D. students in development (see here for full details of the samples), we asked for views on some of the key methodological debates. The answers reveal, for the most part, a large amount of sanity and moderation:

Are experiments special?

Angus Deaton argues that “experiments have no special ability to produce more credible knowledge than other methods”, a statement that seems pure hyperbole to us – as Guido Imbens notes in his reply to Deaton, it is hard to think of a situation where a researcher has the opportunity to randomize, but decides it will be more credible not to. Most respondents also disagree with Deaton:

Note: the missing category not shown is “neither agree nor disagree”

 But people do worry about methodology driving the questions being answered

Another critique of the increasing use of randomized experiments is a concern that “far too often it is not the question that is driving the research agenda but a preference for certain types of data or certain methods” (p 12 in this paper by Martin Ravallion). As the respondents show, it is possible to both believe that randomization does have a special ability to provide credible evidence, whilst also worrying like Martin that methods are driving the questions researchers answer.

What about other methods?

Another concern sometimes expressed by researchers is that there is less appreciation for other methods that are not experimental.  Propensity-score matching is one common non-experimental approach for impact evaluation that relies on selection on observables. Is this automatically a non-starter? Most people aren’t going to rule the paper out for the method alone.

Structural models haven’t made as much in roads into development economics as in some other fields. Assistant Professors in particular seem dubious about whether the estimates of such methods are reliable.

Finally, even cross-country regressions get some love – few people agree that we’ve learnt nothing from them:

Most people aren’t ideologues

Our reading of this evidence suggests that most people aren’t ideologues – they both appreciate the benefits of randomized experiments, but also worry about methodology driving questions. They appreciate that we have learned and are continuing to learn something from other methods. So while arguing against straw men makes for nice rhetoric, it’s not reflecting the reality of most people’s beliefs.

 

Comments

At the risk of being snarky, subjects were being untruthful when responding to the question, "I am likely to reject any paper using propensity score matches." Since the vast majority papers are rejected, the correct answer should be to say "yes." I doubt the respondents answering the question took this fact into account and probably interpreted it as "more likely to reject than a paper using another method." (An alternative explanation is that propensity score papers are rejected far _less_ often than other methodologies, but I doubt it). One of the special advantages of experiments as compared to survey research is that we can get at what people do rather than what they they would do. I could imagine an experiment in which the same basic article is sent out, one with a experimental method and one with an propensity score method (with simulated data so as to make the conclusions identical). I suspect the rejection rate would be equally high for both, but there may interesting interaction effects with other sources of variation (using causal language versus not, etc.)

Submitted by anonymous on
The debate makes little reference to external validity. Often treatments are randomized, but their underlying rationale is unclear or specifric combinations cannot be taken aparts. Thus they cannot be realistically applied to another country or area. We often just don't know what makes them work. What matters is to isolate specific, operationally defined variables that have a chain of causality in the discipline from which they originated. But economics researchers may not know enough of that discipline to make sense of causality chains. That knowledge is what is needed the most. P.S. There are of course many robust quasi-experimental designs, including regression-discontinuity analysis. It's unfortunate that they are not mentioned above.

I've given my thoughts on external validity here: http://blogs.worldbank.org/impactevaluations/a-rant-on-the-external-validity-double-double-standard We didn't ask this of the assistant professors, but we asked the PhD students whether they agreed or disagreed with the statement "External validity is no greater in most non-experimental development research than it is in most papers using experiments". Only 9% disagreed or strongly disagreed with this, while 40% strongly agreed or agreed, and the rest neither agreed nor disagreed. So it seems the view I express in my post is far from unique. And of course there are many other methods like RD, IV, etc, each of which have their own pros and cons. We didn't ask about views on these.

Submitted by Berk Ozler on
I'll be posting on RD design today or soon afterwards. Berk.

Submitted by Berk Ozler on
Dear Anonymous, If the underlying rationale is unclear for any evaluation, then it should not be funded - regardless of IE design. Specific combinations can be nicely taken apart by overlaid (2x2 or other as appropriate) experiments. Replication studies can be done in other settings. Finally, many economists work with people in other disciplines for months or even years while designing an intervention/evaluation to get at exactly the causal paths of impact. We've had a number of posts on these issues over the past few months. What is unfortunate is that economists and experiments get repeatedly blamed for the issues you raise. There are good and not so good researchers, as well as appropriate and inappropriate methods of evaluation for a particular policy or research question. Sincerely, Berk Ozler