From my mailbox: should I work with only a subsample of my control group if I have big takeup problems?
This page in:
Over the past month I’ve received several versions of the same question, so thought it might be useful to post about it.
Here’s one version:
I have a question about an experiment in which we had a very big problem getting the individuals in the treatment group to takeup the treatment. Therefore we now have a treatment much smaller than the control. For efficiency reasons does it still make sense to survey all the control group, or should we take a random draw in order to have an equal number of treated and control?
And another version
When the control group is much larger than the treatment group, is there a way to get a less noisy control group out of the bigger sample and improve things?
Short answer
The short answer is that although with a fixed sample power is greatest when you have an equal sized treatment and control group, power is still going to be greater with 100 treatment and 500 control than with 100 treatment and 100 control. So if budget is not the constraint, but sample available is, you are going to get the most accurate estimate possible of the control mean by including all of the control sample.
But throwing information away can sometimes improve power, if we change the estimand:
Suppose that we are attempting to estimate the impact of a business training program, but that in our experiment we found that the takeup rate was only 10% for illiterate people, and 90% for everyone else. Our sample consists of 1000 treated and 1000 control, and half of each sample is illiterate. Suppose further that 20% of the control group start a business, and for those who do attend training, treatment raises the likelihood of starting a business by 10 percentage points.
Then consider what the power would be for detecting the impact of training on our full sample. The treatment effect in our full sample is 0.5*10*0.1+ 0.5*10*0.9 = 5 percentage points, and the power is therefore 74% (in Stata we would write: sampsi 0.2 0.25, n1(1000) n2(1000)).
Suppose instead we throw away the data on illiterate individuals, and focus on estimating the impact on everyone else. Then our treatment effect for this subsample is 0.9*10 = 9 percentage points, and power is now 90% (in Stata, sampsi 0.2 0.29, n1(500) n2(500)).
So here we have only half the sample, but more power.
The basic insight here is that when takeup is very low for a particular group, the experiment is not going to be very informative about the treatment effect for that group. By changing what we estimate to focus on the effect on the subsample for which we have good takeup, we can get a more precise estimate of the effect for the group the experiment is actually informative about.
This sounds good let me at it or this sounds dodgy, I’m not convinced?
I haven’t seen a formal discussion of this, but there surely must be somewhere (readers?). My recommendation is to consider this as follows:
Here’s one version:
I have a question about an experiment in which we had a very big problem getting the individuals in the treatment group to takeup the treatment. Therefore we now have a treatment much smaller than the control. For efficiency reasons does it still make sense to survey all the control group, or should we take a random draw in order to have an equal number of treated and control?
And another version
When the control group is much larger than the treatment group, is there a way to get a less noisy control group out of the bigger sample and improve things?
Short answer
The short answer is that although with a fixed sample power is greatest when you have an equal sized treatment and control group, power is still going to be greater with 100 treatment and 500 control than with 100 treatment and 100 control. So if budget is not the constraint, but sample available is, you are going to get the most accurate estimate possible of the control mean by including all of the control sample.
But throwing information away can sometimes improve power, if we change the estimand:
Suppose that we are attempting to estimate the impact of a business training program, but that in our experiment we found that the takeup rate was only 10% for illiterate people, and 90% for everyone else. Our sample consists of 1000 treated and 1000 control, and half of each sample is illiterate. Suppose further that 20% of the control group start a business, and for those who do attend training, treatment raises the likelihood of starting a business by 10 percentage points.
Then consider what the power would be for detecting the impact of training on our full sample. The treatment effect in our full sample is 0.5*10*0.1+ 0.5*10*0.9 = 5 percentage points, and the power is therefore 74% (in Stata we would write: sampsi 0.2 0.25, n1(1000) n2(1000)).
Suppose instead we throw away the data on illiterate individuals, and focus on estimating the impact on everyone else. Then our treatment effect for this subsample is 0.9*10 = 9 percentage points, and power is now 90% (in Stata, sampsi 0.2 0.29, n1(500) n2(500)).
So here we have only half the sample, but more power.
The basic insight here is that when takeup is very low for a particular group, the experiment is not going to be very informative about the treatment effect for that group. By changing what we estimate to focus on the effect on the subsample for which we have good takeup, we can get a more precise estimate of the effect for the group the experiment is actually informative about.
This sounds good let me at it or this sounds dodgy, I’m not convinced?
I haven’t seen a formal discussion of this, but there surely must be somewhere (readers?). My recommendation is to consider this as follows:
 It is most justified for variables you have stratified the randomization on, for which you had a natural interest in estimating treatment effect heterogeneity anyway. Examples might include gender, education, or other such variables.
 What you want to do is know exactly who you are estimating the treatment effect for. So it is of interest to say “here is the treatment effect for males, we couldn’t estimate for females” or “here is the treatment effect for educated individuals, we couldn’t estimate for the illiterate”. But it becomes a less interesting estimand if you say stratified on a bunch of characteristics and then find you get low takeup for a few cells and so end up dropping say 2024 year old educated males, and also 3539 year old illiterate females and then reporting the treatment effect for who is left.

The same logic suggests being careful about using low levels of geographical units. If you did your experiment in 3 main cities, and find no takeup in 1, then it is fine to say “here is the treatment effect for these 2 cities”. But if you stratified at the census tract level and then get low takeup insay 20% of census tracts, it is not clear what population you have left if you drop those with low takeup.
Topics
Join the Conversation
Very interesting post. I have an intuition about this, but it might be wrong. Let's say you have a chance imbalance in some key characteristic between your treatment and control groups, e.g. age. If your control group is huge, mean age will be very precisely estimated with very tight confidence intervals, so even if the difference with the treatment group is due to chance (i.e., a few outliers in the small treatment group) it will be more likely that you will reject equality of means between the two groups. Does this make sense? Should we use confidence intervals adjusted for these differences in sample size?
It is true that a small difference in means is more likely to be significant in a large sample than a small sample. But this is true both for the treatment effect you are trying to estimate, and for the comparison of baseline characteristics. As always we should consider magnitudes in addition to statistical significance  so in a small sample you might have a big difference in baseline characteristics that you worry about even if not statistically significant, whereas in a large sample you might have something like you point out, where there is a statistically significant difference in a baseline variable, but the magnitude is so small you don't worry much.
The other point to note is that as the sample size grows, the distribution of the possible differences in means between treatment and control baseline characteristics shrinks around zero.