Syndicate content

What can we learn from medicine: Three mistakes to avoid when designing a trial registry – Guest Post by Ole Dahl Rasmussen

If you are like most people working with quantitative data in development, getting too many statistically significant results is probably not your most pressing problem. On the contrary, if you are lucky enough to find a star, whether it's of the 1%, 5% or 10% type, there are plenty of star-killers to choose from. In what is perhaps the only contribution to the rare genre of 'econometrics haiku', Keisuke Hirano reflects on one of them: T-stat looks too good // Try clustered standard errors - // Significance gone (in Angrist and Pischke's MHE).

Nevertheless, here is another star-killer: A trial registry.  A trial registry is a publicly available database where researchers and evaluators can register a research design prior to initiating a study. Outcome measures and subgroups are crucial aspects to register and importantly, any future changes made to the registration will be visible to everyone. Medical researchers have done this for a long time and trial registration has been required in the US by FDA in the Food and Drug Administration Amendments Act since 1997.

In a recent paper, my co-authors and I pick up on the previous suggestions that hypotheses should be registered in advance (e.g. Duflo et al 2007). We argue that a trial registry for non-medical development interventions can create two levels of credibility of results. If outcome, subgroup and analysis strategy has been specified and registered in advance, results belong to level one. If on the hand there is no registration, i.e. for secondary analysis of the data, then the results belong to level two. As such, secondary analysis is useful and should continue, but the opportunity of achieving level one credibility should not be forgone.

One reason for establishing a trial registry is that there is likely to be significant bias in available evidence. DeLong and Lang (1992) look at the distribution of significance levels and find that published research is biased toward statistically significant results. Two possible reasons are data mining and journal acceptance bias. A trial registry can serve as a check against data mining and can give an overview of all studies on a topic, thereby reducing publication bias or at least helping us assess the magnitude. Edward Glaeser at Harvard has once argued that randomized control trials make data mining "essentially disappear" (Glaeser 2006, page 20). We disagree for the simple reasons that RCTs are often based on large LSMS-style surveys which allow for considerable flexibility in terms of outcome measure. Moreover, analyses on subgroups are common. On top of that, a trial registry will not only be relevant to researchers involved in randomized control trials, but to all studies involving targeted data collection.

Fortunately, we have heard that a trial registry is currently under way. In our paper, we go through the literature from medicine on trial registries. Building on the paper, I present the three mistakes I think should be avoided when looking at fifty years of trial registry history from medicine.

Mistake number one: Include only a sub-discipline, like development economics

If a trial registry does not include all trials within a category of interest, it cannot act as a bulwark against publication bias. Because of this, builders of a trial registry should strongly consider to open the registry to social science in general. Even though this is a larger and certainly slower task, it may be well worth the effort, also for the narrow discipline of development economics. Developments in medicine were surprisingly slow. The first trial registry in medicine was established in the mid 1970s and included only clinical trials related to cancer. Already back then, it was clear that a comprehensive trial registry for medicine would be more beneficial than separate sub-registries, but it was not until 2000 that a comprehensive trial registry, ClinicalTrials.gov, became available (Dickersin and Rennie 2003). Furthermore, the use of this registry only took off in 2005. One reason for this delay was that several subfields maintained their own registries and with these the incentive for common action simply wasn't there.

                For development interventions, it is highly likely that a study can be registered in two registries. Today, a trial on the returns to education in Malawi might fit both the current trial registry for education and a future trial registry for development interventions. So even when thinking about development economics in isolation, there are benefits to being inclusive.

Mistake number two: Fail to align researcher incentives

A trial registry alone is unlikely to make a difference. A necessary second step is that researchers must register trials. Trial registry history from medicine suggests that they will do so only if it matters to the likelihood of publication. The graph below shows the cumulative registrations in ClinicalTrials.gov from 2000 onwards. During the first five years the number rises steadily with 390 new registrations per month. From September 13th 2005 this rate changes to 1300 new registrations per month. On this date, the International Committee of Medical Journal Editors enacted a policy whereby no articles building on original data would be accepted if unregistered. Even though ICMJE is a closed organization with few members, its Uniform Requirements for Manuscripts which includes the trial registry requirement, quickly got adopted by many journals. Today the list counts 850 medical journals. Even without any legal or financial power, this was a true game changer.

                In social science, the same may very well happen if the organizers of the registry are successful in gathering the top journals from various fields, agreeing on common standards for input and prioritizing participation of important stakeholders over speed of implementation. Getting donors on board and mobilizing support among governments in developing countries should also help.  

Mistake number three: Collect low quality data

The quality of the registry will depend on the quality of the data in it. ClinicalTrial.gov has adopted a wide range of procedures to ensure data quality and credibility:

  • All entries are checked manually for consistency
  • All entries must be approved by the submitting institution's organizational account holder. This prevents registration of too many trials.
  • Several procedures work to guard against double registration: Sponsors must confirm entries, automatic searches check for similarities and a system of unique ID's enable checking against other registries.
  • Spell checkers check both entries and search strings against a dictionary of medical language. AEA keywords could serve as a starting point for such a dictionary.

Necessary but not sufficient

We should not have too high expectations and we should be careful that a trial registry does not install a false sense of security. A trial registry will not solve all the credibility issues we have. Further, its effective implementation is likely to require time and money. But being clear about the arguments in favour of a registry and making use of the experiences from medicine should provide a good starting point.

Ole Dahl Rasmussen, PhD Student at University of Southern Denmark and a microfinance and evaluation advisor to DanChurchAid.

Missing your Friday links? – follow David on twitter @dmckenzie001

Comments

Submitted by Gabriel on
I'm skeptical a trial registry would make any difference at all to the quality of research. Is there any evidence that the use of trial registries in medicine has improved the quality of medical research? My skepticism comes from my experience with the Millennium Villages Project. The MVP has a research protocol registered with two trial registries. In published work, Michael Clemens critique the protocol, which has substantial weaknesses. What the MVP has actually published is quite different than what's described in that protocol. They've published results for indicators not listed in the protocol, included project sites not listed in the protocol, excluded sites listed in the protocol, and as a whole not done what the protocol specifies. As far as I can tell, this hasn't caused the MVP any difficulty. They've still published a few papers including one recently in the peer-reviewed American Journal of Clinical Nutrition. We point out some of the flaws with that paper here: http://blogs.worldbank.org/africacan/node/2051 If the protocol in the trial registry isn't binding in a high-profile case like the MVP evaluation, what hope is there that it would be binding for other research? I think both cooperation and monitoring on this point is too difficult. There are so many journals out there that even if you could get the top journals to sign on to a trial registry policy, there will always be other journals willing to publish work that doesn't stick to its pre-announced protocol.

Submitted by Matt on
Hi Ole, We've already chatted about this before, but a couple more thoughts: I think you're a little too optimistic about how a trial registry might impact the real elephant in the room: massive incentives to reject a null of zero impact. Most good researchers approach RCT design with a hypothesis in mind, but after spending thousands of dollars on a costly surveys and possibly the intervention itself, when their hypothesis turns out to be bunk, subgroups suddenly look a lot more attractive. While establishing a trial registry might have some success in lowering the value of data mining, it does nothing to address the incentive system that pushes people towards data mining in the first place if those `zero' results never get published. Sure, researchers can say "you can checkout our non-result in the hypothesis", but that isn't going to fly in a viva or with a tenure committee. Secondly, we can devise a two-tier system, where research is flagged when it goes outside of the the original design, but having a de jure system of different levels of credibility doesn't guarantee a de facto system of different levels of credibility. Take significance levels, for example - ten years ago, p values of 5% seemed to be the golden threshold. Over time, it has crept towards 10% - the reasons for this are probably myriad, but some of us must be due to changes in de facto changes in acceptance levels. Even if journals all agree to `star' only trial registry results, if the rest of the academic community doesn't effectively discern the two, I wonder if we would really see much of a change. Finally, I'd be worried about what we mean by having a two-tier system anyway. The lower tier is obviously flagged with "this could possibly be data mined", but aside from that concern, if, for example, I only included `investment' as an outcome variable in my registered research plan but later decided to look at 'income' as well (without turning to theory) there's no reason why one would be a more valid statistical result than the other (assuming that I'm not reverting to subgroups, etc), so I think the comparison is a little bit misleading.

Submitted by Ole on
Dear Gabriel Just a few comments: In the case of the Millenuem Village Project you are lucky that they have an ex-ante research protocol and that you can get access to it. In most of the development economics I know of, that is not the case. Some don’t have a protocol and almost none of those who have make them available. In fact, I can think of other examples. Please post if you can. But that is one reason we need a trial registry: To get more access to the ex-ante thoughts of researchers. As you have thoroughly pointed out, there are many reasons why the current impact assessments of the Millium Villages Project have low internal validity and low credibility in general. I think that is well established, not the least by your work. The fact that they do not adhere to their protocol is one item on a long list of issues and we should not be too optimistic: There are many other issues that can make results uncredible. As such, we should be careful that a trial registry does not give os false sense of security. Nevertheless, would you really say that the fact that MVP deviated from the protocol did not affect your assessment of the credbility of their results? I wouldn’t. As long as it does, we still need a trial registry in my opinion. On the evidence of trial registries in medicine, identification of the causal effect is difficult for the simple reaons that the primary effect is likely to be preventive: With a trial registry, researchers are simply more careful and put more work into ex-ante research design, instead of into ex-post data work. It would be interesting to get comments on this from the medical profession. Finally, you write that there will always be journals that will publish results on non-registered outcomes. I agree – and I think they should. There will always be room for analysis of unregistered trials and for secondary analysis of registered trials. But the fact that we can get useful information from secondary analysis should not stop us from adding credibilty to analyses where we indeed did register results.

We at the International Initiative for Impact Evaluation (3ie) find this blog post and the accompanying journal article engaging and timely, not least because we will soon launch a registry for impact evaluations of development interventions. Just this week, we released a Request for Proposals (http://www.3ieimpact.org/userfiles/doc/3ie%20Registry%20RFP.pdf) for support to the development and maintenance of the 3ie registry. We generally agree with the points Ole makes and are planning to address the possible “mistakes”. First, the 3ie registry will certainly be open to all branches of social science. We will welcome and encourage researchers in all disciplines to register impact evaluations of development interventions in the 3ie registry as long as the studies are built around a credible counterfactual. Second, during the design stage of the registry, we will consult with a wide variety of stakeholders like professional associations, journal editors, donor agencies, and national governments. While journal publication is one important incentive, we are also interested in incorporating incentives for commissioned impact evaluations to be entered into the registry. It is certainly our hope that as the culture for evidence-based decision making grows, governments and donors will increasingly commission RCT and quasi-experimental evaluations of their programs. Often, the primary objective of these commissioned studies is not journal publication. However, it is just as important to have these studies registered, not just to inform the interpretation of their results but also to make these results (and non-results) readily available for systematic reviews. Third, we will draw on the best practices and lessons learned from existing registries in order to design both a form and a submission process that will ensure high data quality. Additionally, we will explore using an open API for our registry so that the data can be more easily accessed and combined with data from other registries. In addition to registry, yet another tool for increasing the transparency and rigor of impact evaluation studies is replication. We define replication as the process by which an independent researcher reanalysis the data set and codes to check for errors and measures the sensitivity of results to plausible alternative specifications and theoretical assumptions. In addition, replication may also involve checking if the results can be duplicated with similar but different data. 3ie is launching a replication program in which selected impact evaluations of development interventions will be replicated. The registry and the replication program will complement each other by allowing the replicator to “retrace the steps” of the original researchers.

Submitted by Ole on
Dear Matt A quick response on your first point. A typical comment when I present the paper is ”Oh, but we only found the real connections after we’ve looked into the data”. Or even better: ”We only figured out what to look for when we went to the site and then it was to late to change in questionnaire.” In the first case, one could hope that higher stakes in pre-registered results would make people more willing to repeat other studies. Not just the data work, which is difficult enough, but the study itself. On the last example, if a trial registry can push researchers to spend more money and time in the location and with the people they study and if we believe this will make them better at theorizing and eventually lead to more stars, then I think that has value. Many of the usual writers on this blog have written on how they use some qualitative techniques to increase their understanding of issues and I also remember getting good points from e.g. Kanburs work on Q-squared (even though a senior develop economist told me that in his experience it be square root Q instead :- ).

Submitted by Ole on
I am of course happy to learn that 3ie is in the process of setting up a trial registry. Just a few ideas for the design: It is great to hear that 3ie will consult with a large number of stakeholders. The evidence from medicine clear shows that commitment from these are central to usage, not least journal editors. I medicine, a number of editors have simply put as a requirement for publications that trials are registered. When I presented the paper last week, some other suggestions came up that might be more suitable to the context of development interventions. Journals might just allocate a special section to papers reporting on pre-defined outcomes. Or they could require authors to explicitly state whether they registered in the trial registry. For this they would need the unique identifier, which I hope will be a part of the trial registry designed by 3ie. Another aspect is the fields required to submit. Here, ClinicalTrials.gov is remarkable by having only very few required fields. At the same time, some care is taken to check that these fields are filled in in a meaningful way. Having too many required fields might be disincentives to researchers who want to guard their own ideas. A similar suggestion would be have a delay on the registration time and the publication time in some fields, so that e.g. the analysis strategy would not be disclosed until after two years. In our paper, we give specific suggestions as two which fields we think should be required. Finally, ClinicalTrials.gov allows very easy download of entries. After searching, it is possible to download entire entries or only specific fields as comma separated files. This makes usage of registry data easy. Looking forward to registering trials in the future.