Published on Development Impact

Power Calculation Software for Randomized Saturation Experiments

This page in:

One of the things I get asked when people are designing experiments – when they are either interested in or worried about spillover effects – is how to divvy up the clusters into treatment and control and what share of individuals within treatment clusters to assign within-cluster controls. The answer seems straightforward – it may look intuitive to assign a third to each group and I have seen a few designs that have done this, but it turns out that it’s a bit more complicated than that. There was no software that I am aware of that helped you with such power calculations, until now...

As a companion to our significantly revamped paper, titled “Optimal Design of Experiments in the Presence of Interference” (Baird, Bohren, McIntosh, and Özler 2016), we have a developed software to help researchers conduct power calculations when their experimental design calls for a two-stage randomization (first randomize clusters into different treatment saturations or intensities, followed by assigning individuals to treatment – based on the realized saturations in the first stage):

  • The dedicated webpage, courtesy of the Policy Design and Evaluation Lab of UC San Diego, comes with a graphical user interface (GUI) for ease of use.
  • A video provides a tutorial on how to use it.
  • We have also supplied Python, R, and MATLAB code, all of which allow the reader to replicate our findings in the paper, as well as to improve upon our code to conduct optimization that would not be easy or possible to do with the GUI. 
Using the GUI is simple: the researcher specifies an objective (i.e. which estimands to identify and the relative importance of each) and the software calculates the optimal design. The researcher can also provide a specific design and the software calculates the minimum detectable effects (MDEs) for estimands this design identifies. Credit goes to Patrick Staples for creating the GUI and coding in Python and R.

Behind the tool is a revamped paper with two new contributions. First, we set up a potential outcomes foundation for our model and map this into the regression models commonly used by applied economists to analyze randomized controlled trials. It is analogous to Athey and Imbens (2016), see Section 2.5 in particular, but for a setting with intra-cluster correlation and partial interference. The potential outcomes framework allows us to anchor our results firmly in the existing statistics and econometrics of experiments literatures, and provides a bridge between these literatures and the linear regression models used to analyze randomized saturation (RS) designs in practice. As Athey and Imbens (2016) state, sometimes “…it is helpful to take an explicitly causal perspective [on linear regression]. This perspective also clarifies how the assumptions underlying identification of causal effects relate to the assumptions often made in least squares approaches to estimation.”

Second, we added an application section that uses numerical simulations to illustrate the theoretical tools we develop using hypothetical and published study designs. First, we explicitly define and estimate optimal designs for objective functions that include different individual saturation, slope and pooled estimands. We demonstrate the power trade-offs that arise – based on which estimands the researcher would like to identify and estimate, as well as the relative weights that ze puts on each estimand. We calculate MDEs for randomized saturation designs in published papers and show how these designs affect the power trade-off between different estimands. For example, we are able to show that if we knew what we know now back in 2007, we could have designed our own Malawi cash transfer experiment (comparing CCTs to UCTs) differently, which would have produced lower MDEs for all estimands of interest.

What are some practical takeaways from the paper? These aren’t easy to summarize without getting technical, so I’ll try my best, but I suggest reading the application section for more clarity:
  • If you’re equally interested in identifying the treatment and spillover effect at each saturation (treatment intensity), then you need to allocate more clusters to the extreme saturations. For example, if the treatment saturations in your study are 0, 0.2, 0.4, 0.6, and 0.8, then you need to allocate more clusters to 0.2 and 0.8 than 0.4 and 0.6. This disparity declines with the intra-cluster correlation (ICC). Given ICC and cluster size, numerical simulations can provide the optimal allocation.
  • If you’re only interested in detecting a slope effect, you don’t need a pure control group. In this case, you should have saturations that are pretty extreme and symmetrical about 0.5 – somewhere around 0.1 and 0.9 depending on the ICC. You can add more saturations to test linearity, curvature, etc.
  • One of the main messages of our paper remains the same as before, but with a new insight: sometimes a researcher will ambitiously design a RS experiment, only to find that treatment or spillover effects do not vary by treatment saturation. This means that presenting the treatment effects by saturation is not that interesting – they’re all more or less the same. The instinct then is to pool all treatment (spillover) observations together to estimate an average (or pooled) treatment (spillover) effect. As the inherent heteroskedasticity of this regression model is no longer an issue when there is no heterogeneity in treatment effects (see Corollary 3 in the paper), it looks like the researcher can recover an average treatment effect at no cost – with the increased power that comes from pooling observations. Unfortunately, this is not the case:
    • We show that for the pooled ITT, a partial population experiment, in which there is a pure control group and a single treatment saturation, is optimal. Any deviation from the constant treatment probability reduces power or increases the MDE. This is true even when the errors are homoscedastic…
    • If the researcher cares equally about treatment and spillover effects, the treatment probability for that single interior saturation is 0.5.
    • As for the pure control group, it is never optimal to assign only a third of the clusters to pure control. The optimal range is between 0.41 and 0.5 – again depending on the ICC.
    • Bottom line: if the researcher a priori believes that slope effects are small and ICC is high, she is best off selecting a partial population design. More ambitious designs allow you to identify more estimands, but come with a risk of reduced power for pooled effects – should you wish/need to estimate them ex post…
We hope that you find the software useful in designing your experiments and that the revised paper helps with the underlying ideas. If you do use the GUI, please let us know about your experience so that we can improve it in subsequent versions. Also, if you improve our code using Python or R, please share it with us so that we can include it at our website.


Berk Özler

Lead Economist, Development Research Group, World Bank

Join the Conversation

The content of this field is kept private and will not be shown publicly
Remaining characters: 1000