Syndicate content

power calculations

Power Calculations for Regression Discontinuity Evaluations: Part 3

David McKenzie's picture
This is my third, and final, in a series of posts on doing power calculations for regression discontinuity (see part 1 and part 2).
Scenario 3 (SCORE DATA AVAILABLE, AT LEAST PRELIMINARY OUTCOME DATA AVAILABLE; OR SIMULATED DATA USED): The context of data being available seems less usual to me in the planning stages of an impact evaluation, but could be possible in some settings (e.g. you have the score data and administrative data on a few outcomes, and then are deciding whether to collect survey data on other outcomes). But more generally, you will be in this stage once you have collected all your data. Moreover, the methods discussed here can be used with simulated data in cases where you don’t have data.

There is then a new Stata package rdpower written by Matias Cattaneo and co-authors that can be really helpful in this scenario (thanks also to him for answering several questions I had on its use). It calculates power and sample sizes, assuming you are then going to be using the rdrobust command to analyze the data. There are two related commands here:
  • rdpower: this calculates the power, given your data and sample size for a range of different effect sizes
  • rdsampsi: this calculates the sample size you need to get a given power, given your data and that you will be analyzing it with rdrobust.

Power Calculations for Regression Discontinuity Evaluations: Part 2

David McKenzie's picture

Part 1 covered the case where you have no data. Today’s post considers another common setting where you might need to do RD power calculations.
Scenario 2 (SCORE DATA AVAILABLE, NO OUTCOME DATA AVAILABLE): the context here is that assignment to treatment has already occurred via a scoring threshold rule, and you are deciding whether to try and collect follow-up data. For example, referees may have given scores for grant applications, and proposals with scores above a certain level got funded, and now you are deciding whether to collect outcomes several years later to see whether the grants had impacts; or kids may have sat a test to get into a gifted and talented program, and now you want to see whether to collect to data on how these kids have done in the labor market.

Here you have the score data, so don’t need to make assumptions about the correlation between treatment assignment and the score, but can use the actual correlation in your data. However, since the optimal bandwidth will differ for each outcome examined, and you don’t have the outcome data, you don’t know what the optimal bandwidth will be.
In this context you can use the design effect discussed in my first blog post with the actual correlation. You can then check with the full sample to see if you would have sufficient power if you surveyed everyone, and make an adjustment for choosing an optimal bandwidth within this sample using an additional multiple of the design effect as discussed previously. Or you can simulate outcomes and use the simulated outcomes along with the actual score data (see next post).

Power Calculations for Regression Discontinuity Evaluations: Part 1

David McKenzie's picture

I haven’t done a lot of RD evaluations before, but recently have been involved in two studies which use regression discontinuity designs. One issue which comes up is then how to do power calculations for these studies. I thought I’d share some of what I have learned, and if anyone has more experience or additional helpful content, please let me know in the comments. I thank, without implication, Matias Cattaneo for sharing a lot of helpful advice.

One headline piece of information that I’ve learned is that RD designs have way less power than RCTs for a given sample, and I was surprised by how much larger the sample is that you need for an RD.
How to do power calculations will vary depending on the set-up and data availability. I’ll do three posts on this to cover different scenarios:

Scenario 1 (NO DATA AVAILABLE):  the context here is of a prospective RD study. For example, a project is considering scoring business plans, and those above a cutoff will get a grant; or a project will be targeting for poverty, and those below some poverty index measure will get the program; or a school test is being used, with those who pass the test then being able to proceed to some next stage.
The key features here are that, since it is being planned in advance, you do not have data on either the score (running variable), or the outcome of interest. The objective of the power calculation is then to see what size sample you would need to have in the project and survey, and whether it is worth you going ahead with the study. Typically your goal here is to get some sense of order of magnitude – do I need 500 units or 5000?

Did you do your power calculations using standard deviations? Do them again...

Berk Ozler's picture

As the number of RCTs increase, it’s more common to see ex ante power calculations in study proposals. More often than not, you’ll see a statement like this: “The sample size is K clusters and n households per cluster. With this sample, the minimum detectable effect (MDE) is 0.3 standard deviations.” This, I think, is typically insufficient and can lead to wasteful spending on data collection or misallocation of resources for a given budget.

Power calculations: what software should I use?

Berk Ozler's picture

In my experimental work, I almost always do cluster-randomized field experiments (CRTs – T for trials), and therefore I always used the Optimal Design software (OD for short), which is freely available and fairly easy to use with menu based dialogue boxes, graphs, etc. However, preparing some materials for a course with a couple of colleagues, I came to realize that it has some strange basic limitations. That led me to invest some time into finding out about my alternatives in Stata. I thought I’d share a couple of things I learned here.

Notes from the AEAs: Present bias 20 years on + Should we give up on S.D.s for Effect Size?

David McKenzie's picture
I just got back from the annual meetings of the American Economic Association (AEAs) in Boston. It’s been a couple of years since I last went, and after usually going to just development conferences, it was interesting to see some of the work going on in other fields. Here are a few notes:
 

gender power doesn't come cheap

Markus Goldstein's picture

coauthored with Alaka Holla

As we argued last week, we need more results that tell us what works and what does not for economically empowering women. And a first step would be for people who are running evaluations out there to run a regression that interacts gender with treatment.   Now some of these will show no significant differences by sex.   Does that mean that the program did not affect men and women differently? No. Alas, all zeroes are not created equal.  

Power Calculations 101: Dealing with Incomplete Take-up

David McKenzie's picture

A key issue in any impact evaluation is take-up (i.e. the proportion of people offered a program who use it). This is particularly an issue in many finance and private sector (FPD) programs. In many health and education programs such as vaccination campaigns or getting children to school programs, the goal of the program is actually to have all eligible individuals participate. In contrast, universal take-up is not the goal of most FPD programs, and, even when it is a goal, it is seldom the reality.

Pages