Published on Development Impact

Power calculations: what software should I use?

This page in:

In my experimental work, I almost always do cluster-randomized field experiments (CRTs – T for trials), and therefore I always used the Optimal Design software (OD for short), which is freely available and fairly easy to use with menu based dialogue boxes, graphs, etc. However, preparing some materials for a course with a couple of colleagues, I came to realize that it has some strange basic limitations. That led me to invest some time into finding out about my alternatives in Stata. I thought I’d share a couple of things I learned here.

One of the strange things about OD seems to be that you cannot calculate power for an individually-randomized trial (IRT – OD calls these person randomized trials) if your outcome variable is binary. This is particularly strange because you get this option for CRTs (continuous vs. binary outcomes) and you even get to enter the lowest and highest expected proportion across your clusters to aid with the power calculations for a binary variable. But, when you try to do this for an IRT, OD forces you to calculate power as if your proportions were means of a continuous variable. Weird…

Other small annoyances with OD:

  • No Mac version (even more annoying is their answer to this query here)
  • You can only enter effect sizes and not separate means or proportions (perhaps not the biggest deal but it forces the user to standardize, assume equal variances, etc.)
  • It only gives you total number of clusters (CRT) or sample size (IRT), assuming equal split, whereas you might want to fix the size of your treatment group (say budget constraints) and calculate control group size.
OD is good for:
  • Multi-level experiments (students within classrooms within schools), multi-site studies (all of the previous in multiple sites): you can enter multiple variance components, variance explained by the blocking variable, other covariates, etc.

So, what else is there? For IRT, a super simple website is SWOG (choose “Two Arm Normal” or “Two Arm Binomial”), which might be good for managers thinking about a simple experiment randomized at the individual level, and wondering how many observations are needed to start thinking about a potential study.

Most economists use Stata and my impression is that many people used the “sampsi” command. It’s an older command, but it still works. It’s not for CRTs, but typing “help sampsi” and clicking on the dialog tab will give you a menu not unlike OD. For a simple command, it does a fair amount of useful things: allows you to specify the number of per- and post-measurements, variance explained by baseline covariates, power by method (ANCOVA, DID, or post levels), continuity corrections for propotions, etc. All in all pretty good if you’re not doing a CRT. Be careful, though: it defaults to 90% power rather than the more traditional 80, unless you specify otherwise. That’ll give you more power and make your study more expensive…

Stata now seems to have replaced the “sampsi” command with “power.” It is definitely fancier (graphs, more options, etc.), but with that comes some more work. I couldn’t really figure out a way to do the repeated measurements as simply as “sampsi” so I gave up on it for the course we’re teaching – variance-covariance matrices might be a bit much for those who want the basics of power calculations. But, my sense is that I might well use it for my own work after I spend a bit of time to figure out all the correct options to utilize.

To do power calculations for CRTs in Stata, you can use the “clustersampsi” command. The help file is useful and you can install a dialog box for it by following the instructions at the end of the help file. Playing around with it for a couple of days, I found that I could do most things I want to do for a CRT with this command. It nicely tells you what the sample size for IRT would be, the design effects, and your sample size with your CRT. It even allows for entering the coefficient of variation for cluster size – in case your study cluster samples won’t all be the same size – to account for the power loss that comes with moving away from equal-sized clusters. There is also the command “rdpower,” which does roughly the same thing but seems a bit simpler. If I were doing a multi-level or multi-site CRT with individual (or, especially, cluster-level) outcomes, I’d still probably go to OD to calculate power, but for most other things “clustersampsi” produced the same results for power calculations.

I must have obviously missed things, so feel free to post other software, options, improvements, etc. in the comments section.

P.S. If someone at a seminar asks you what the minimum detectable effect is for your study, it is 2.8 times the standard error of your effect estimator (for 80% power). I have seen people ask this in seminars (not needed unless you are curious about ex ante power calculations done by the authors), and I have also seen it answered, "just multiply by 1.96" (for 95% confidence). It is true that that answer gives you the estimate, which would allow you to reject the null hypothesis of zero effect for alpha=0.05, but it is not the minimum detectable effect. If you got an effect size rejects the null, but its implied power is very low, the chances that it is a false positive is much higher. Because the minimum detectable effect size allows you to reject the null under the alternative hypothesis of there being an effect, you need to add the t-value for 1-power, which is 0.84 when power=0.8. Hence, the 2.8 (1.96+0.84) times the standard error of your estimate...See Cyrus Samii's nice slides on this here.


Berk Özler

Lead Economist, Development Research Group, World Bank

Join the Conversation

The content of this field is kept private and will not be shown publicly
Remaining characters: 1000