Power calculations: what software should I use?

|

This page in:

In my experimental work, I almost always do cluster-randomized field experiments (CRTs – T for trials), and therefore I always used the Optimal Design software (OD for short), which is freely available and fairly easy to use with menu based dialogue boxes, graphs, etc. However, preparing some materials for a course with a couple of colleagues, I came to realize that it has some strange basic limitations. That led me to invest some time into finding out about my alternatives in Stata. I thought I’d share a couple of things I learned here.


One of the strange things about OD seems to be that you cannot calculate power for an individually-randomized trial (IRT – OD calls these person randomized trials) if your outcome variable is binary. This is particularly strange because you get this option for CRTs (continuous vs. binary outcomes) and you even get to enter the lowest and highest expected proportion across your clusters to aid with the power calculations for a binary variable. But, when you try to do this for an IRT, OD forces you to calculate power as if your proportions were means of a continuous variable. Weird…

Other small annoyances with OD:

  • No Mac version (even more annoying is their answer to this query here)
  • You can only enter effect sizes and not separate means or proportions (perhaps not the biggest deal but it forces the user to standardize, assume equal variances, etc.)
  • It only gives you total number of clusters (CRT) or sample size (IRT), assuming equal split, whereas you might want to fix the size of your treatment group (say budget constraints) and calculate control group size.
OD is good for:
  • Multi-level experiments (students within classrooms within schools), multi-site studies (all of the previous in multiple sites): you can enter multiple variance components, variance explained by the blocking variable, other covariates, etc.

So, what else is there? For IRT, a super simple website is SWOG (choose “Two Arm Normal” or “Two Arm Binomial”), which might be good for managers thinking about a simple experiment randomized at the individual level, and wondering how many observations are needed to start thinking about a potential study.

Most economists use Stata and my impression is that many people used the “sampsi” command. It’s an older command, but it still works. It’s not for CRTs, but typing “help sampsi” and clicking on the dialog tab will give you a menu not unlike OD. For a simple command, it does a fair amount of useful things: allows you to specify the number of per- and post-measurements, variance explained by baseline covariates, power by method (ANCOVA, DID, or post levels), continuity corrections for propotions, etc. All in all pretty good if you’re not doing a CRT. Be careful, though: it defaults to 90% power rather than the more traditional 80, unless you specify otherwise. That’ll give you more power and make your study more expensive…

Stata now seems to have replaced the “sampsi” command with “power.” It is definitely fancier (graphs, more options, etc.), but with that comes some more work. I couldn’t really figure out a way to do the repeated measurements as simply as “sampsi” so I gave up on it for the course we’re teaching – variance-covariance matrices might be a bit much for those who want the basics of power calculations. But, my sense is that I might well use it for my own work after I spend a bit of time to figure out all the correct options to utilize.

To do power calculations for CRTs in Stata, you can use the “clustersampsi” command. The help file is useful and you can install a dialog box for it by following the instructions at the end of the help file. Playing around with it for a couple of days, I found that I could do most things I want to do for a CRT with this command. It nicely tells you what the sample size for IRT would be, the design effects, and your sample size with your CRT. It even allows for entering the coefficient of variation for cluster size – in case your study cluster samples won’t all be the same size – to account for the power loss that comes with moving away from equal-sized clusters. There is also the command “rdpower,” which does roughly the same thing but seems a bit simpler. If I were doing a multi-level or multi-site CRT with individual (or, especially, cluster-level) outcomes, I’d still probably go to OD to calculate power, but for most other things “clustersampsi” produced the same results for power calculations.

I must have obviously missed things, so feel free to post other software, options, improvements, etc. in the comments section.


P.S. If someone at a seminar asks you what the minimum detectable effect is for your study, it is 2.8 times the standard error of your effect estimator (for 80% power). I have seen people ask this in seminars (not needed unless you are curious about ex ante power calculations done by the authors), and I have also seen it answered, "just multiply by 1.96" (for 95% confidence). It is true that that answer gives you the estimate, which would allow you to reject the null hypothesis of zero effect for alpha=0.05, but it is not the minimum detectable effect. If you got an effect size rejects the null, but its implied power is very low, the chances that it is a false positive is much higher. Because the minimum detectable effect size allows you to reject the null under the alternative hypothesis of there being an effect, you need to add the t-value for 1-power, which is 0.84 when power=0.8. Hence, the 2.8 (1.96+0.84) times the standard error of your estimate...See Cyrus Samii's nice slides on this here.

Authors

Berk Özler

Lead Economist, Development Research Group, World Bank

Rohit
June 29, 2015

I've seen a couple of people use the PowerUp! tool for calculating power/sample size using an Excel worksheet (http://web.missouri.edu/~dongn/Dong-Maynard-PowerUp-paper.pdf) although I haven't used it myself.
It is possible to calculate power in Stata for a clustered design using sampsi if you don't mind a two-step process where you first run sampsi and then run sampclus (available off ssc.) However, clustersampsi is a little more parsimonious, so probably prefereable.

Berk Ozler
June 29, 2015

Thanks, I was wondering what sampclus was about -- that explains it.

Tim E.
June 29, 2015

G*Power: Statistical Power Analyses for Windows and Mac
http://www.gpower.hhu.de/

Berk Ozler
June 29, 2015

I originally wanted to include this as an alternative, but when I tried to go to the website, I was unable to reach it and gave up. It was suggested before when David wrote about ICC calculations in Stata a while back and seems to be working fine now...
Thanks,
Berk.

Andy
June 29, 2015

A while ago, Juan José Matta, Francis Smart, and I discussed power calcs via simulation. Francis then published a neat blog post here
http://www.econometricsbysimulation.com/2012/11/power-analysis-with-non…
Andy

Berk Ozler
June 30, 2015

Great - thanks.

Peter
June 29, 2015

I find the following formulas useful: page 34 of Bloom, Richburg-Hayes, and Black (2007) "Using Covariates to Improve Precision for Studies That Randomize Schools to Evaluate Educational Interventions" or Schochet (2007) "Statistical Power for Random Assignment Evaluations of Education Programs" ~page 75. These include covariate adjustment at different levels (e.g. schools and individual baseline covariates to aid precision).
These posts are great. Speaking of power, I also enjoyed a post from a while back on different randomization strategies. Is there particular (ideally Stata) routine you use to do matched-pair random assignment?

Berk Ozler
June 30, 2015

Thanks - that's useful. I think OD allows for covariate adjustment at multiple levels.
I'll let whoever wrote the randomization post you're referring to answer your question (I don't have a go to routine for this, mainly because I have never done matched pair random assignment).... 

econ man
June 30, 2015

nice post

Peter
July 01, 2015

One thing to keep in mind with the clustersampsi stata command: it requests the baseline correlation between before and after measurements. Not the R^2...Easy mistake to make since a lot of formulas use the R^2.

GG
July 24, 2015

Great post. Berk, could you post slides/notes on the course you mention? Thanks!

Berk
July 27, 2015

Why don't you email me at my WB address? Thanks,
Berk.

Laura C
July 12, 2015

Thank you for the informative post. I was wondering what are the solutions if you already know how many people you can survey, say because of budgetary constraints or because you have an regression discontinuity design. Are OD, sampsi and clustersampsi able to tell you what the power would be for your given sample size?