Power Calculations 101: Dealing with Incomplete Take-up

|

This page in:

A key issue in any impact evaluation is take-up (i.e. the proportion of people offered a program who use it). This is particularly an issue in many finance and private sector (FPD) programs. In many health and education programs such as vaccination campaigns or getting children to school programs, the goal of the program is actually to have all eligible individuals participate. In contrast, universal take-up is not the goal of most FPD programs, and, even when it is a goal, it is seldom the reality. Not all households or firms will want or need a loan, wish to purchase insurance, or desire to participate in some training program. Thus some microfinance experiments have found take-up rates of only 5-10%, and experiments with rainfall insurance likewise have struggled to get high take-up.

It is well-known that low take-up reduces statistical power (i.e. our ability to reject the null hypothesis that a program has no effect when this null is false -  or in layman’s terms, whether we will be able to detect whether our program has had an effect). However, I frequently get asked how to do power calculations in this case and my experience is that many people don’t have a sense of just how much damage low take-up can do to your power calculations. So I thought I’d explain it here.

Let p be the difference in the proportion of the treatment group that takes up a given intervention relative to the control group. Then looking at the formula for the minimum detectable effect in a single cross-sectional follow-up survey (see e.g page 33 of the randomization toolkit), one can quickly see that the needed sample size is inversely proportional to the square of p.

What does this mean in practice?

·         If p = 0.75 (e.g. 0% of the control group take the intervention and 75% of the treatment group do), the sample size needed is 1/(0.752) = 1.77 times as large as it would be with 100% compliance.

·         If p = 0.50, the sample size needed is 4 times as large as with 100% compliance

·         If p=0.25, the sample size needed is 16 times as large as with 100% compliance

·         If p = 0.10, the sample size needed is 100 times as large as with 100% compliance!

·         If p = 0.05 (as in some microinsurance/microfinance), you need 400 times the sample as with 100% compliance!

You see very quickly that low take-up massively increases the sample you need to detect a desired effect.

To make this crystal clear, consider an example of a business training intervention, where the goal is to increase business profits from a baseline value of $1000, assuming the standard deviation of profits is also $1000.

To detect a 10% increase in profits with p=1.0 (i.e. 100% of those in the treatment group get training, and 0% in the control group do), in STATA use the command:

Sampsi 1000 1100, sd1(1000) power(0.9)

And you will get output telling you that you need a treatment group size of 2102

Now suppose instead that you expect only half of those you invite to training to show up; and that the training then still leads to a 10% increase for those you actually train. Then the mean profits for the treated group will be 1000 + 0.5*100 = 1050, and so:

Sampsi 1000 1050, sd1(1000) power(0.9)

gives a needed treatment group size of 8406, 4 times the size you would need with 100% compliance.

So what can you do when you face this problem? I am sure this will be a recurring theme on this blog, with one of my earlier posts already touching on this. We will discuss some of these in more detail in future posts, but basically your options are to do one or more of the following:

-          Increase your baseline sample (and thus expense) to deal with this, and proceed as normal.

-          Collect a lot more time periods of data on each observation (see e.g. This paper)

-          Try and understand why take-up is low, and undertake actions (e.g. advertising, encouragement) to try and increase take-up rates

-          Restrict the study to a group of units for whom take-up would be much higher – e.g. a business training program could be advertised to all eligible firms, and then the number of slots available in the program could be randomly allocated amongst only those firms who apply.

Authors

David McKenzie

Lead Economist, Development Research Group, World Bank

Ryan
May 23, 2011

One area I've seen a disconnect is how we deal with attrition in power calcs versus in analysis. I've heard of power calcs done where attrition was taken into account through this same Wald Estimator as if it were non-compliance -- basically, assuming that if we couldn't find someone in the treatment group, a conservative guess would be that they have the control group mean. That is, you assume N stays the same, but the average treatment effect goes down.

But then when actually doing analysis, you don't fill in treatment group people we couldn't find with the control group mean, you just leave them missing, with a check on whether attrition is correlated with treatment status. Filling in missing values with a number, control group mean or otherwise, feels wrong because it changes the distribution of responses and potentially narrows confidence intervals.

Which side do you fall on? Account for potential attrition during power calcs via a change in N, or via the assumed average treatment effect? If ATE, then do your analysis in the same manner as the power calcs, or leaving missing respondents as missing?

Berk Özler
May 23, 2011

David may have a different answer, but I'd say that the first goal should be for sample attrition to not be so high as to to significantly change power (calculations). We are a lot better in development economics now than we used to be and regularly get high tracking rates, even in long-term panel studies. However, if this was a worry, I'd treat it as a change in N (hopefully balanced across treatment and control, because otherwise things get even more dicey).

In analysis, I'd never fill in values in an ad hoc manner. This is only acceptable if you are doing a Lee bounds type of analysis, where you're assuming extreme values for the treatment and control arms with missing values to see if such extreme assumptions can change the main findings. In the main analysis, I'd leave the missings as missing, but would report in great detail about the nature of the attrition (i.e. its size, baseline characteristics, balance between treatment and control, etc.).

David McKenzie
May 23, 2011

I always deal with attrition in power calculations through adjusting N. So e.g. If the power calculations say I need a sample of 1000, but I expect 15% attrition, I would then make sure I have at least 1150 in the sample. Of course this is only good if we can assume attrition is missing at random. If one is going to have to rely on bounds analysis, attrition kills your power more, and you need a bigger sample. You can then do power calcs for the bounds, assuming that e.g. the attritors all have a really low value or all have a really high value of the outcome of interest.
I haven't come across people doing it in the way Ryan suggests, but would be interested to hear what others do.

Rebekka Grun
May 24, 2011

My favorite option is keeping attrition as low as possible to begin with. In a recent experience in Tunisia with a very mobile target population we managed to get the no-response rate from 25% to 10% by closely supervising the survey firm (i.e. having one of us sit and survey with them and review their files). Of course not very popular (especially if the survey firm is 'someone' in their country and our 'spy' is youngish), but always worth the friction. Still, 10% is not negligeable. Grateful for further ideas along this line.