Scenario 3 (SCORE DATA AVAILABLE, AT LEAST PRELIMINARY OUTCOME DATA AVAILABLE; OR SIMULATED DATA USED): The context of data being available seems less usual to me in the planning stages of an impact evaluation, but could be possible in some settings (e.g. you have the score data and administrative data on a few outcomes, and then are deciding whether to collect survey data on other outcomes). But more generally, you will be in this stage once you have collected all your data. Moreover, the methods discussed here can be used with simulated data in cases where you don’t have data.
There is then a new Stata package rdpower written by Matias Cattaneo and co-authors that can be really helpful in this scenario (thanks also to him for answering several questions I had on its use). It calculates power and sample sizes, assuming you are then going to be using the rdrobust command to analyze the data. There are two related commands here:
- rdpower: this calculates the power, given your data and sample size for a range of different effect sizes
- rdsampsi: this calculates the sample size you need to get a given power, given your data and that you will be analyzing it with rdrobust.
Since this uses a lot more inputs than the cases in my first two posts, it gives you more precise output, and deals with all the key factors going into the design effect: the correlation between the score and treatment assignment, the reduction in sample that comes from choosing the optimal bandwidth, and adjustments from bias-correction procedures in choosing the bandwidth. If you have this in the planning stages, you can therefore use this to help choose what sample size to survey and to check you will have enough power, as discussed in the previous posts.
Another use is once people have data, to help understand the power consequences of choosing different bandwidths for the RD estimation. For example, using the Senate elections data they provide as demonstration data with the package:
- rdpower demvoteshfor2 demmv, tau(5) shows that one has 81.8% power to detect a 5 percentage point jump in vote share (this is what tau gives) at the cutoff, with the optimal bandwidth chosen (which is 17.7 here). I can then see what power would be if I take a smaller bandwidth, say of 10: rdpower demvoteshfor2 demmv, tau(5) h(10) – this tells me I would only have 45.2% power with the smaller bandwidth.
Notes: Matias adds that you may want to use the option scaleregul(0) when using this command, which is not the default, but avoids regularization choosing quite small bandwidths.