Registering studies when all you want is a little more credibility

|

This page in:

Some time ago, colleagues from World Bank (WB) operations contacted me with a request to evaluate an upcoming cash transfer program for refugees in Turkey. We started discussing the possibilities: perhaps a randomization if there is likely to be over-subscription? No, eligibility rules clear, no rationing among eligible. OK, so maybe a regression discontinuity design? No, the eligibility criteria create a lumpy score, not a nice and continuous one. OK, some other quasi-experimental identification strategy then…

The policy question of whether the cash transfers are affecting the lives of refugees in important ways, rather than simply serving as the most basic of safety nets, is an important one and we want to answer it well. It is likely that any findings from an assessment might affect revisions to the program.

With observational studies, the default mode of operation is that the researcher gets access to all the data, picks an identification strategy, does the analysis, and writes up the results. The trouble is, I do not trust myself (or my collaborator) to not be affected in our choices by the impact findings, their statistical significance, etc. Not that I have a stake in the success of the program: it is more that I am worried about subconscious choices that can take the analysis in one direction than the other – exactly because I can see the consequences of these choices pretty easily if I have all the data…

So, this is what we did: we asked the WB task team leader (TTL) to only give us the baseline data. My collaborator would then look at various options to create proper treatment and counterfactual groups using only the baseline data, after which we would register this choice – by writing up the main identification strategy and describing the analysis that would follow once we gained access to the follow-up data. The idea is that once we defined who belongs in the treatment group and who in control, then we could fall back on standard operating procedures for analyzing experimental data when we received the follow-ups.

Now, to the strictly skeptical, this is not enough: how would someone know if the TTL was really diligent in not giving us the follow-up data, that we’re being honest, etc. This is true: we don’t have an ironclad case (registration before even the first study participant was recruited) that we’re not finishing or p-hacking in some way. It would have been ideal if we managed to do all the analysis with the baseline data and register it before the first round of follow-up data collection. But, life happens sometimes and this is where we are. We just want to be able to state publicly somewhere - with a time stamp - that this is what we did, this is our plan, and let the referees, reviewers, and consumers decide on their chosen amount of trust in our process.

What about the registration itself? I was surprised to see, despite the fact that its URL is https://www.socialscienceregistry.org, the AEA’s registry is an exclusively RCT registry. I tried anyway and asked the administrators if they would house a non-RCT in their system and the answer was no. I was politely referred to EGAP and RIDIE. I know a lot of people who are involved with EGAP, so I decided to try their registry.

I like that EGAP, on its registration page, acknowledges that the AEA trial registry is only for RCTs and that a general registry for social science research is likely to be a good idea: “The EGAP registry is providing an unsupervised stopgap function to store [study] designs until the creation of a general registry for social science research.”

I also really liked the following question on whether the registration is prospective or retrospective, which was exactly what we needed – given the circumstances of our study:

B5 Is this Registration Prospective or Retrospective? – multiple choice (SELECT ONE)

  • N/A
  • Registration prior to any research activities
  • Registration prior to assignment of treatment
  • Registration prior to realization of outcomes
  • Registration prior to researcher access to outcome data
  • Registration prior to researcher analysis of outcome data
  • Registration after researcher analysis of outcome data
  • Other (if selected, short text field appears)

The registration form also asks you if you presented the design at an EGAP meeting, which, I think, is one of the more useful functions of pre-registration: to receive critical feedback before it’s too late. Journal of Development Economics’ “Registered Reports” is a similar idea taken further…Perhaps the only thing I thought could be improved was the question on methodology:

Methodology – select all that apply

  • Experimental Design 
  • Field Experiments 
  • Lab Experiments 
  • Mixed Method 
  • Statistics 
  • Survey Methodology

I would have liked to put quasi-experimental as our answer, but had to go for mixed method, as that seemed the least wrong answer.

How did our story end? We did end up being able to create a counterfactual group out of non-beneficiary study participants using propensity score matching. Once we trimmed the tails, there was sufficient overlapping support and balance on lagged values of important outcomes and baseline characteristics likely to be prognostic of future outcomes. In other words, balance on observables. We fixed that logit model (meaning that T and C are now fixed at baseline) and put it into our pre-analysis plan (PAP) and proposed two approaches for checking the stability of our findings with respect to this choice: nearest neighbor matching and coarsened exact matching. Submitted the documents to EGAP last week.

We now have the data from the first follow-up and analyzing attrition – in the way we specified in the PAP. Wish us luck…

Authors

Berk Ozler

Lead Economist, Development Research Group, World Bank

Join the Conversation

Stuart Buck
May 20, 2019

Could also register at OSF (fairly open-ended) or at SREE's new registry: https://sreereg.icpsr.umich.edu/

Akib
May 21, 2019

This is extremely useful - thanks for sharing! We are doing a similar prospective quasi-experimental evaluation of a tertiary education program in South Africa and registered the PAP on RIDIE.
Have a query regarding the choice of identification strategy: There has been some discussion on the use of matching in such cases where the tails are trimmed to ensure balance (on lagged outcomes). In particular, if the underlying (T and C) populations are different (in first moment), matching by pre-T outcomes could lead to biased treatment-effect estimators due to regression-to-the-mean post-T (but a DiD would be OK if parallel counterfactual trends hold). This twitter thread details the different considerations citing relevant papers: https://twitter.com/laura_tastic/status/1022890688525029376
Was wondering - if it's OK to ask - what was the case for your study and the considerations that led to the choice of PSM instead of, say, DiD (which, of course, relies on a different set of assumptions). Would be super-helpful.
Thanks in advance for your kind response! Would really appreciate it!