Published on Development Impact

Dealing with attrition in field experiments

This page in:

Here is a familiar scenario for those running field experiments: You’re conducting a study with a treatment and a comparison arm and measuring your main outcomes with surveys and/or biomarker data collection, meaning that you need to contact the subjects (unlike, say, using administrative data tied to their national identity numbers) – preferably in person. You know that you will, inevitably, lose some subjects from both groups to follow-up: they will have moved, be temporarily away, refuse to answer, died, etc. In some of these cases there is nothing more you can do, but in others you can try harder: you can wait for them to come back and revisit; you can try to track them to their new location, etc. You can do this at different intensities (try really hard or not so much), different boundaries (for everyone in the study district, region, or country, but not for those farther away), and different samples (for everyone or for a random sub-sample).

Question: suppose that you decide that you have the budget to do everything you can to find those not interviewed during the first pass through the study areas (doesn’t matter if you have enough budget for a randomly chosen sub-sample or everyone), i.e. an intense tracking exercise to reduce the rate of attrition. In addition to everything else you can do to track subjects from both groups, you have a tool that you can use for those only in the treatment arm (say, your treatment was group-based therapy for teen mums and you think that the mentors for these groups may have key contact information for subjects who moved in the treatment group. There were no placebo groups in control, i.e. no counterpart mentors). Do you use this source to track subjects – even if it is only available for the treatment group?

Now, I know that you’re expecting an answer, as most of the time our blogs are about a new paper that is addressing such a question. However, in this case, while I have my ideas, I can’t claim to have the answer. In fact, a short search that I conducted has not produced an answer to this question. But, it did lead me to read the “attrition” chapter of the “Field Experiments” book by Gerber and Green, because I thought that Don Green might have mentioned something about this in his 2012 book. The reason is that this question was actually put to me by one of our field workers in a current experiment, where we are currently, and intensively, tracking all respondents who can be tracked. When I answered that we are tracking as intensively as we can for everyone, meaning that we use all sources of information available on the possible whereabouts of our missing subjects, the said field worker told me that Don Green, in the class that she took from him at Columbia, told the class that it was better to leave such information on the table, lest it causes differential attrition. So, off I went to revisit chapter 7 in his book…

For those who want a primer on the topic, this chapter is a good read as it goes into things in a bit more detail than the randomization toolkit by Duflo, Glennerster, and Kremer(2006) (I don’t have the more recent Glennerster and Takaravasha book at my fingertips). Of course, you should also check out our “tools of the trade” posts on attrition. While the chapter generally goes through the familiar topics that surround the handling of attrition, I did learn a couple of things that I had not thought about as carefully previously. I also liked the use of the potential outcomes framework, which makes things easier to understand (and more contemporary). Here is a quick summary of the chapter, in bullet point form:

  • Attrition is a scourge in all kinds of studies, but the sting is felt most acutely by those who set up RCTs, because of the threat of bias in an otherwise clean design. In field experiments with survey or biomarker data collection, there will always be loss to follow-up and without some assumptions about the form of that attrition, it may be impossible to make any causal inferences about intention to treat or average treatment effects.
  • If attrition (missingness in Gerber and Green, 2012) is independent of potential outcomes (MIPO), then we have unbiased estimates but reduced power. Of course, MIPO is an assumption that you cannot rally confirm or deny, but investigate with good statistical detective work in the usual ways that we increasingly all do in economics and political science. Sometimes, MIPO may be satisfied but only conditional on certain baseline covariates X, say, age, sex, location, etc., which the book calls MIPO | X.
    • Again, MIPO | X is simply a conditional on observables assumption, but one of the things that I took away from reading the chapter was that you may examine this assumption by looking at baseline variables that are prognostic of the follow-up outcome within these cells. Suppose that you have differential missingness by age and sex, but you think that things are orthogonal once we condition on these covariates: if your outcome is a test score and you have a baseline score, you could simply look at the baseline scores within each of these cells and confirm that they don’t predict attrition. Of course, if you could do that, why not further your MIPO | X assumption, where X also includes that (those) prognostic variable(s)? Just as in the examination of baseline balance, you don’t have to control for things that are unbalanced: you should control for things that are prognostic of the outcome of interest.
    • MIPO | X also gives you the commonly known and sometimes used reweighted ITT using cell proportions, or otherwise known as inverse probability weighting (IPW).
  • Generally, however, it may simply be impossible to convince yourselves, your referees, or your editor, that missingness is not a problem. And, given that missingness is a fact of life, you’ll have to deal with it. As mentioned above, IPW is one way of dealing with it, but increasingly one that does not cut it with referees, especially in RCTs because you’re invoking a “conditional on observables” assumption – something from which you were desperately trying to get away when setting up your study. I personally still like it reported (it can simply be in the form of showing an F-test of the regression of missingness on a bunch of prognostic baseline covariates; and interacted with treatment status): gives me a data point that I can evaluate.
  • What Gerber and Green (2012) define as “extreme value bounds” and “trimming bounds” correspond to what economists usually call “Manski” and “Lee” bounds:
    • Manski bounds are assumption-free and bracket the true ITT or ATE. However, in practice, they can be so wide as to be meaningless. This can either happen because attrition is large (so there are too many missing values to fill with extreme values) or the outcome variable is such that range of plausible values is large. So, if you have a binary or a discrete outcomes and low attrition, Manski bounds can work for you. Otherwise, you are out of luck here: this approach will suggest that the effect of your program could be anything…
    • What about Lee bounds? Well, despite its popularity, I never liked this approach over the alternatives… It introduces a monotonicity assumption that excludes the existence of “if-untreated-reporters” (if attrition is higher in control) or that of “if-treated-reporters” (if attrition is higher in treatment). Once you do this, you can no longer recover an estimate for the ITT for your original sample, but only an estimate of program effects for “always-reporters”. Rank preservation restrictions, which are discussed in the Duflo et al. (2006) toolkit (which refers to the Angrist, Bettinger, and Kremer, 2006 paper that is also used as an example in Gerber and Green, 2012) and the Behaghel et al. (2015) paper, discussed by David here, are versions of the same assumption. The method is popular because it gives tighter bounds than the Manski approach (even tighter with the improvements proposed by Behaghel et al., 2015), but it comes at the cost of making another assumption and giving up the ITT/ATE for the original random sample and settling for them among the always-reporters. I don’t understand why many reviewers frown upon IPW but are much more accepting of Lee bounds: in RCTs, often times the ITT on the original random sample is of key importance – it’s the population of interest. Sure, the ITT for the always-reporters may be informative in cases where we’re looking for a test of a theoretical prediction, but such subtlety is hardly present in papers or referee reports.
  • OK, so none of the ways to deal with attrition are sufficiently attractive and a mild case can torpedo your study. So, the best offense is a good defense: prevent large amounts of attrition to begin with. One can limit attrition by devoting more funds to finding subjects at follow-up, but, of course, those funds come at the cost of something else: a larger sample size, better measurement of outcomes, etc. The book is convincing in guiding researchers towards selecting a random sample of those lost to follow-up and intensively going after them. It does so by showing the differences in bias in ATE and the extreme value bounds theoretically and through simulation exercises. In cases where the attrition problem is non-negligible and “regular tracking” is unlikely to be highly successful in lowering it enough to make “Manski bounds” meaningfully tight, a plan to select a sub-sample (perhaps using block randomization) and trying really hard to find them in ways that are more expensive than regular tracking may end up being more cost-effective. Simulations (presented in Table 7.6 of the book) are useful because selecting a sub-sample is risky business – it will produce noisier estimates that will get assigned higher weights in the final tally. Intuitively, when the intensive second round subsample is large enough and successful in finding most people, and therefore more likely to be MIPO, the benefits from this approach are shown to be the highest, especially if the first round attrition was not MIPO. And, hoping for success in such an intensive tracking exercise does not have to be a pipe dream: in this paper, we randomly sampled one in three children who did not have assessments after the first pass through their schools and villages for further tracking, and were able to find 37 of the 42 children (88%) randomly assigned to second-round tracking. As Gerber and Green point out, you can then conduct Manski bounds on a much smaller share of the sample: suppose you were missing 25% of your sample in the first round and found 90% of the random sub-sample in the second round: you now need to fill in only 2.5% of the sample 0.25 x (10/100) to calculate extreme value lower and upper bounds…
Now, finally, back to the question we started with. Yes, we’re convinced that we should have a good plan to minimize attrition as much as we possibly can within our means. But, is extra information about the treatment group the forbidden fruit of knowledge? I don’t think so, for the following reasons...

First, it seems to me that it is paramount to minimize the number of subjects lost to follow-up: this makes bounds estimated later tighter – I am willing to have possibly a bit more bias in my ITT/ATE for tighter bounds. Second, it is not clear to me that using an additional source to find someone is all that different than other things our experienced enumerators might be doing to locate everyone they can: perhaps there was also a more prevalent source in control clusters that is not observable to me as the PI. Third, it is also possible that, ex-post, the treatment group is much less likely to be found in their original villages because, perhaps, the treatment caused more of them to seek new opportunities in other (urban) areas (see, for example, this paper by Markus and colleagues). In such cases, an advantage in finding treatment subjects in the second, intensive tracking round may actually restore balance in missingness by closing the gap in attrition from the first round. Finally, if you define “trying really hard to obtain outcome measures for missing subjects” as doing everything that you can possibly do given your budget (and not intentionally pursuing different methods by study arm, but simply using all sources of information to locate missing people), then I am not sure that this clearly constitutes a new source of bias. To me, that cat is already out of the bag as soon as we were unable to find some people from either group – as we simply don’t know what factirs are causing whom to be lost to follow-up in either group. Once we’re agnostic about the bias and trying to provide bounds for the ITT as tightly as we can, I’d rather minimize the number of subjects missing rather than leaving sample on the table…

But, like I said, this is more of a bleg and I am happy to hear counterarguments – they might, after all, save my next field experiment. Please comment below, especially if you’re Don Green or you work in his lab…

Update (9/25/2017; 8:00 AM): Within an hour Alex Coppock (@aecoppock) responded with a link for "Double Sampling for Attrition." Worth your while checking it out...


Berk Özler

Lead Economist, Development Research Group, World Bank

Join the Conversation

The content of this field is kept private and will not be shown publicly
Remaining characters: 1000