I received a question recently from a friend that reflects a common issue facing many impact evaluation researchers as they collect follow-up data:
“After all our initial survey efforts, the response rate is 87% for the treatment group and 77% for the control group, with this difference statistically significant. Shall we tell the survey company to now focus all of their attention on just trying to survey remaining control group individuals to close this gap? Or should we worry this will somehow bias the results?”
I first noted there are many similarities to a post Berk wrote a while back on whether you should use extra contact information if it is only available for the treatment group, which had some excellent discussion in the comments as well. But since there are a few differences and it is one that faces many people doing impact evaluations, I thought I’d discuss it a bit more here.
First, make sure there is outcome stability
A first concern to worry about is what Coppock et al. term the outcome stability assumption in the context of their work on double sampling. This assumes that outcomes are invariant to whether we collect them in the first phase of survey efforts, or whether they are collected in whatever special extra last effort stage is being done. There are several ways this could be violated:
· Differences in timing: perhaps you have a question that refers to income or business profits in the past month, or you are asking about something like overall life satisfaction that could vary with the season and with news events. Then if these extra observations are collected at a different point in time from the rest of the sample, this can induce a bias.
· If differences in survey modality affect answers to some questions: e.g. perhaps the original survey efforts all took place in person, and these extra efforts involve phone or online surveys and people answer some questions differently depending on survey mode.
· If extra survey effort changes responses: for example, perhaps the way you encourage reluctant control group members to participate is to offer them some financial incentives, and then if receiving these incentives changes how they respond to a question (e.g. because of experimenter demand, or because of gratitude, or because you are asking them about current financial stress).
So think carefully about what you are measuring and whether the above could apply. The advantage of using the extra effort for both treatment and control is that you could then include dummy variables in the treatment regression for survey timing and survey mode, which can help control for any such differences.
Second, ascertain whether the binding constraint is budget or just how many will respond
If budget is not the binding constraint, then you can have your cake and eat it too – you can expend these additional efforts on both the treatment and control group, and then always decide not to use the extra treatment observations collected in some of the analysis.
This relates to the Behagel et al. bounding approach, where the idea is to sharpen Lee bounds by using how much effort it takes to reach respondents to trim away some respondents from the group with higher response, to get a group with similar response rates to the group with lower response rates. E.g. in the opening example, if there was 72% response from the treatment group after 5 or fewer interview attempts, then we might trim away all treated individuals who took more than 5 attempts, giving a response rate more similar to the control. The key assumption here is a monotonicity or response rank assumption – for example, you need to assume that treatment might affect the overall willingness of people to answer the survey, but not their relative rank within group – so the first 70% of treatment responders are the people who would have responded had they been in the control group, and then the additional 10% of people who took more effort to respond in the treatment group would not have responded had they been in control. This is not necessarily an innocuous assumption, as my post summarizing their paper discusses.
While it is nice to equalize treatment and control response rates, sometimes this may be less informative and risk more bias than going after the harder to respond in both groups
It is common to read papers which just say something like “attrition was 20% and balanced across treatment and control groups”, and then ignoring attrition after that. This, and the Lee bounds approach which has intervals whose width depends on how differential the response rates are make it seem obvious that we should strive to minimize treatment and control response differences. People might go a little further and look at balance on baseline observables, following the tests suggested in this forthcoming JHR paper by Ghanem et al.
However, at best this might give you an unbiased treatment effect for the subsample of individuals who would respond regardless of treatment status. This may still be a biased estimate of the overall effect of the program on the experimental sample if the non-responders differ in outcomes from the responders. So what we really want to do is devote resources to units whose outcomes might be quite different or underrepresented, which may come from reducing treatment versus control response gaps, or may actually come from increasing them. Let me give two examples.
Example 1: Attrition of the bottom tail in the control group
In an ongoing study, I am looking at the long-term impact of business training after 7-8 years. We did one round of extensive surveying by phone, and had a 69% response rate, with the response rate 10% higher for treatment than control. We then went back and did a more intensive in-person survey round, which enabled us to boost the response rates for both groups. The additional treated firms surveyed look pretty similar to those we had already surveyed in the first attempt, but the additional control firms surveyed had a lot of entrepreneurs who were no longer running firms or who had very low profits. Not including these extra control firms made it seem like the control group was doing better than it actually is, leading to underestimation of the treatment group. So here putting extra resources into tracking down control firms who had failed was key for measuring the impact of treatment, and we could have ignored the extra treated firms.
Example 2: Attrition of the most successful treated individuals
Consider another type of business intervention, where perhaps the most successful firms move out of the area or are too busy to answer surveys under regular survey conditions. Let’s suppose that the only firms to get to this status are an upper tail of the treated firms. Then even if we had 80% response for treatment and 70% response for control, we would be better devoting extra resources to trying to track down some more of the treatment group, to get a chance of getting some of these success stories, even if this then took our response rate to 90% treated, 70% control.
How do we know which situation we are in?
This is where some of the ideas from the Behagel et al. type approach can be useful. If you track how much effort was involved to get each firm to respond, you can see how the marginal responders in each group compare on baseline variables to the easy responders and to those who haven’t responded – and then also look at outcomes and see whether the marginal responders look similar or different on outcomes from the easy to respond. Then, if for example, we find that among the treatment group, baseline characteristics and outcomes look similar for these different categories, but for the control group the marginal responders are looking less profitable and less likely to be running businesses, I would want to put extra effort into tracking controls. In contrast, if I have a treatment where I know/hope that all the effects may be for an upper tail only of the treated, then perhaps I devote more effort to getting some of these to respond. Sometimes you may have some take-up data or administrative data that can provide some sense of this and of who is missing, or you may have data from earlier follow-up rounds.
In sum, exert lots of effort for everyone if you can, but otherwise think about more than just the attrition rate and instead about who the marginal attritors are likely to be
I wrote in this old post about how many survey attempts and persistence could reduce attrition rates overall, and our survey methods curated links has several other posts about how to reduce attrition. So most of the time I am trying to exert lots of effort on getting more of both treatment and control to respond. But once you end up in a situation like the one at the start of my email, then hopefully the above shows that it is not always obvious that you should just focus on surveying the group with lower response. Sure, we love to report balanced attrition, but it is worth thinking about the other issues above.
Join the Conversation