Syndicate content

Response to Brown and Wood's "How Scientific Are Scientific Replications? A Response"

Berk Ozler's picture
I thank Annette Brown and Benjamin Wood (B&W from hereon) for their response to my previous post about the 3ie replication window. It not only clarified some of the thinking behind their approach, but arrived at an opportune moment – just as I was preparing a new post on part 2 of the replication (or reanalysis as they call it) of Miguel and Kremer’s 2004 Econometrica paper titled “Worms: Identifying Impacts on Education and Health in the Presence of Treatment Externalities,” by Davey et al. (2014b) and the response (Hicks, Kremer, and Miguel 2014b, HKM from hereon).  While I appreciate B&W’s clarifications, I respectfully disagree on two key points, which also happen to illustrate why I think the reanalysis of the original data by Davey et al. (2014b) ends up being flawed.

First, B&W state, by citing a slide by Nosek from the recent ASSA session on “Promoting New Norms for Transparency and Integrity in Economic Research” (slides not available online), that his example “…provides a rather sobering justification for internal replication but it also, we think, demonstrates that an exercise in determining which one of those analyses is “right” would not be productive.” Having not been able to review Nosek’s slides, I concede the possibility that there exist 29 equally valid combinations of causal inference methods and choices of data analysis that produce qualitatively different results in his data set. However, this does not imply that we should think of any alternative methods used to address a research question or any alterations to the handling of the data as being equally valid. As I discuss in detail in the next post, the key choices made by Davey et al. (2014b) in their reanalysis of the original deworming data, which lead them to their conclusion that “…the evidence that the intervention improved school attendance differed according to how we analysed the data” (page iv) and to which HKM vigorously object and respond, are highly suspect – for lack of a more generous adjective.

This leads me directly to my second disagreement with the 3ie replication window process. B&W state: “One important clarification here is that our paper series is a working paper series, not a journal.” My question is: “Why not?” The papers that are being replicated are journal articles, and more often than not in journals into which the barriers of entry are very high. Usually, two to four peer reviewers, who are experts in the subject matter and the relevant econometric/causal inference methods, in addition to a journal editor (and sometimes another associate editor) have reviewed these articles carefully. Why should we have a different standard for their reanalysis? B&W mention 3ie’s review process, but I feel fairly confident in speculating that Davey et al. (2014) would not be accepted for publication in its current form in any top tier economics journal (general or field), and not because these journals are loathe to publish replications (which they are) but simply on merit. Now would the earlier reanalysis of Jensen and Oster’s “Cable TV and Women’s Status” by Iversen and Palmer-Jones. 

Researchers are free to question the assumptions, methods, and interpretations used in published papers and post them online, but this does not mean that we should be giving them valuable research funding to do it, because when we do it this way, the incentives are skewed on all sides. 3ie, having commissioned the reanalysis (as opposed to having received the final paper as a submission for publication) has no graceful way of ‘rejecting’ the final product if it is deemed subpar. The reanalysis authors are trying to find something that they can report, which had escaped the attention of the original reviewers/editors and which refute/weaken the original findings. The original study authors have to spend countless number of hours (probably more like weeks or months) responding to numerous claims by reanalysis authors (in contrast, in the journal publication process, a good editor would tell the authors which points to address and which ones to ignore because they don’t make any sense or they are not important).

Lastly, B&W show an admirable amount of trust in their audience: “What we would hope is that folks who are interested in understanding whether there is discrimination in soccer would look at the assumptions made, how concepts are measured, what specifications are estimated, and what theories are implied or proposed in order to decide which of the analyses provide credible and relevant information for their purposes.” I agree that this is true for part of the intended audience: for example, the small readership of our niche blog – be it in private communications or public comments – usually displays a great understanding of a lot of the technical issues we discuss. However, that is only part of the audience: there exist others, who either don’t feel completely comfortable evaluating the relative merits of the original study, the reanalysis, and the response to the reanalysis or simply don’t have the time to do so (the reanalysis by Davey et al. 2014b is 64 pages and the response by HKM 37). If the audience is larger and demands a neutral refereeing of these products, then the process 3ie adopted for these replications (in particular, the treatment of the reanalysis studies as working papers) is inherently flawed and B&W defense of it sadly amounts to a cop out. Effectively, and B&W’s explanation of their approach notwithstanding, 3ie is giving credibility to these replications by putting its implicit stamp of approval on them. From what I have seen so far, I am neither convinced that this is doing more public good than harm nor that limited research funds should be spent in this way.

Please see my review of Davey et al. (2014b) and HKM’s response in the next post.

Update (2/2/2015): A reader sent links to Nosek's AEA presentation and the soccer discrimination paper in question.

Comments

Submitted by Garret Christensen on

If you are curious about the slides in Brian Nosek's talk, they were from a paper which was just submitted to PNAS. You can read it here: https://osf.io/j5v8f/
Figure 1 is the relevant one, but the paper is all about having multiple people try and answer the same question with the same data.

Submitted by Berk on

Thanks Garret, looks interesting. I'll update the post to hyperlink the paper...

Berk.

Add new comment