# If your follow-up survey has attrition, what should you do for your descriptive analysis?

There are multiple ongoing efforts to use phone panel surveys to track the dynamics of how households and firms have been affected by the COVID-19 pandemic. One of the teams working on these surveys wrote to me to say that there is sizeable attrition, so what are my recommendations for how they should deal with this when conducting their analysis. I thought I’d share my thoughts in case they are useful for others, or in case there are other suggestions of what should be done. Readers, please also share whether you know of a textbook that covers dealing with attrition particularly well- many of the textbooks don’t discuss attrition at all or provide only a couple of pages discussing it in general terms. Glennerster and Takavarasha’s book has the best coverage of those on my bookshelf at home, but it has been a long time since I was in my office to check those books.

We’ve blogged before about ways to reduce attrition and approaches for handling attrition in field experiments (see the reducing attrition section in our curated links on surveys). Since those posts are largely focused on impact evaluation, they don’t address as directly what to do for dynamic descriptive analysis. For example, teams might be interested in answering questions like “how much did household consumption fall over the pandemic and has it recovered to pre-pandemic levels?” or “did female-run firms experience more of a drop in sales than male-run firms during the pandemic?”. With these questions we are less concerned with treatment vs control imbalances in the attrition rate, and more concerned with overall rates of attrition and whether this attrition is selective.

**Approach 1: Inverse-Probability Weighting**

The most standard approach is probably to use inverse probability weighting (IPW). This assumes that selection into survey response occurs based on baseline observables (e.g. perhaps manufacturing firms are more likely to not answer their phones than retail firms, or households in more urban areas are less likely to answer their phones than households in smaller towns). Here you model the probability of responding to the survey, and then re-weight those who respond, so that e.g. the few manufacturing firms that do respond get more weight than the retail ones, so that the reweighted data looks like the baseline data in terms of observables. In practice this means running a logit for the probability of responding to the survey as a function of baseline variables, getting the fitted probabilities, and then using the inverse of these fitted probabilities as weights in the regression analysis.

It is hard for me to think of an occasion where I have found IPW very convincing or reassuring. The problem is that it doesn’t deal with any selection on dynamics. In the COVID-19 setting, for example, we might be concerned that it is the firms that the pandemic caused to close down who have telephone numbers that no longer work, or that it is households that the pandemic makes poorer who no longer can afford to pay their phone bills. If the baseline variables are not strong predictors of which firms close, or which individuals will migrate or turn off their phones, then re-weighting based on observables won’t help solve the selection bias that much.

**Approach 2: Bounding and Sensitivity Approaches**

A second alternative is to examine how much your results would change if the units you do not observe are differentially selected in particular ways. For example, if the question of interest is “how many households are now below the poverty line?”, then you can obtain your survey estimate from the sample answering the survey, and then obtain an upper bound by assuming that all of those households you did not re-interview are below the poverty line, and a lower bound by assuming that all of those not interviewed were above the poverty line. For example, if your follow-up survey manages to reinterview 70% of households, and you find 20% are below the poverty line, then the upper bound is 0.7*0.2+0.3 = 44% below the poverty line, and the lower bound is 0.7*0.2 = 14% are below the poverty line. Likewise, in the same spirit as Horowitz-Manski bounds for treatment vs control comparisons, if you want to compare the female vs male business owner gap in firm closures, you can obtain one bound by assuming all the male attriting firms are open and female attriting firms re closed, and the other bound by assuming the opposite.

There are two problems with this approach. First, as the above example of the poverty line illustrates, when attrition rates are high, the bounds will be very wide, and so not that informative. Secondly, the bounds can be even wider if the outcome of interest is a continuous variable with a large maximum, like income.

A more useful approach can be to apply the Kling and Liebman (2004) sensitivity bounds approach. Rather than saying “what’s the worse than can happen”, the idea is instead to examine how the results change as one varies the amount of selection. For example, in a firm survey, you can see how the results change if you assume that the firms that are not re-interviewed are two times or three times or four times as likely to have closed as those that you did re-interview. A couple of examples of this approach in practice can be seen in Blattman et al.’s 2020 long-term follow-up of cash transfers to youth in Uganda where they provide bounds in the case that unfound youth differ by 0.25 s.d. from those who are found, and in Berk’s recent work on cash transfers for refugees in Turkey, which shows bounds under the assumptions that attritors are 0.1 or 0.25 s.d. above or below the means of non-attritors.

**Approach 3: Learn about selection through those it takes more effort to find**

Perhaps the most helpful approach is not to assume that selective attrition is only related to observables (as in approach 1) nor to assume that you don’t know anything about how selective it is (as in approach 2), but instead to use the data you have and survey in order to learn a bit about this selection. They key assumption here is that *units that take a lot more work to re-interview are more like the attritors than units that are easy to re-interview. *

The first thing needed here is a way of characterizing what is meant by “take more work to re-interview”. One way is the record the number of call attempts it takes to reach the household or firm, as is done in Behagel et al. (2012) bounds (which I blogged about here). You can then see not just whether baseline observables are similar for those found on their first attempt to those it takes five or six attempts or the use of more tracking to find, but more importantly, see whether the current outcomes differ for these. A second way is the subsampling approach, where you choose a random sample of the attritors to exert more effort to re-interview. For example you might carry out your phone survey of firms, and after five attempts, have achieved a 60 percent response rate. You could then take a random subsample of the 40 percent who have not responded and use more extensive tracking methods to try to get them to answer (perhaps using social networks to find other contact numbers; searching the webpages, Facebook pages and twitter feeds of firms to see if they are still active; offering them financial incentives to get them to respond; maybe in-person surveying may be possible for a small sample even if it is not for the full sample, etc.). Glennerster and Takavarasha’s book discusses this subsampling approach, which was used, for example, in the long-run follow-up surveys of Miguel and Kremer’s deworming experiment.

Then as well as helping you see whether it looks like those households or firms that it takes a lot more work to re-interview have systematically different outcomes from the majority of your respondents, you can then use this information to sharpen either approach 1 or approach 2. Millan and Macours argue that inverse-probability reweighting may work better when applied to the individuals surveyed in an intensive tracking phase, since they may be more similar to the attritors on unobservables. Taking a random sample of the attritors for intensive tracking and using this random sampling to reweight the data can increase the effective response rate, sharpening any bounds that are then applied. E.g. if the first round of surveying has a 60% response rate, and then you select a 25% subsample of the attritors to intensively track, and are able to get 70% of them to respond, then you give each unit that responds a weight of 4 (1/0.25) and the effective response rate becomes 0.6+ 0.4*0.7 = 88%, resulting in much narrower bounds than with your original 60% response rate. You could also use the comparison of those units it takes more work to call and those it takes less work to call to help inform the plausibility of your Kling-Liebman bounds. For example, if you find that 10% of firms that it takes 1-3 attempts to re-interview are closed, and 12% of firms that it takes 4-6 attempts to re-interview are closed, then maybe you think it is reasonable to consider a bound where you assume at most 15% of the attritors are closed – rather than the worst case scenario of them all being closed.

This is definitely an issue that a lot of researchers and policy teams will be dealing with as they analyze all of the rapid response surveys they have been collecting, so any other reader suggestions of approaches or good practices for dealing with attrition in analysis are most welcome.

## Join the Conversation