Syndicate content

David McKenzie's blog

Weekly links September 9: no to cash? Machine learning for economists, climate economics, stupid regulations unchanged and more…

David McKenzie's picture

Power Calculations for Regression Discontinuity Evaluations: Part 2

David McKenzie's picture

Part 1 covered the case where you have no data. Today’s post considers another common setting where you might need to do RD power calculations.
Scenario 2 (SCORE DATA AVAILABLE, NO OUTCOME DATA AVAILABLE): the context here is that assignment to treatment has already occurred via a scoring threshold rule, and you are deciding whether to try and collect follow-up data. For example, referees may have given scores for grant applications, and proposals with scores above a certain level got funded, and now you are deciding whether to collect outcomes several years later to see whether the grants had impacts; or kids may have sat a test to get into a gifted and talented program, and now you want to see whether to collect to data on how these kids have done in the labor market.

Here you have the score data, so don’t need to make assumptions about the correlation between treatment assignment and the score, but can use the actual correlation in your data. However, since the optimal bandwidth will differ for each outcome examined, and you don’t have the outcome data, you don’t know what the optimal bandwidth will be.
In this context you can use the design effect discussed in my first blog post with the actual correlation. You can then check with the full sample to see if you would have sufficient power if you surveyed everyone, and make an adjustment for choosing an optimal bandwidth within this sample using an additional multiple of the design effect as discussed previously. Or you can simulate outcomes and use the simulated outcomes along with the actual score data (see next post).

Power Calculations for Regression Discontinuity Evaluations: Part 1

David McKenzie's picture

I haven’t done a lot of RD evaluations before, but recently have been involved in two studies which use regression discontinuity designs. One issue which comes up is then how to do power calculations for these studies. I thought I’d share some of what I have learned, and if anyone has more experience or additional helpful content, please let me know in the comments. I thank, without implication, Matias Cattaneo for sharing a lot of helpful advice.

One headline piece of information that I’ve learned is that RD designs have way less power than RCTs for a given sample, and I was surprised by how much larger the sample is that you need for an RD.
How to do power calculations will vary depending on the set-up and data availability. I’ll do three posts on this to cover different scenarios:

Scenario 1 (NO DATA AVAILABLE):  the context here is of a prospective RD study. For example, a project is considering scoring business plans, and those above a cutoff will get a grant; or a project will be targeting for poverty, and those below some poverty index measure will get the program; or a school test is being used, with those who pass the test then being able to proceed to some next stage.
The key features here are that, since it is being planned in advance, you do not have data on either the score (running variable), or the outcome of interest. The objective of the power calculation is then to see what size sample you would need to have in the project and survey, and whether it is worth you going ahead with the study. Typically your goal here is to get some sense of order of magnitude – do I need 500 units or 5000?

Weekly Links September 1: Entrepreneurship, Co-authoring, results-free review, and more…

David McKenzie's picture
  • Interview with Erik Hurst.  He discusses entrepreneurship “Most small businesses are plumbers and dry cleaners and local shopkeepers and house painters. These are great and important occupations, but empirically essentially none of them grow. They start small and stay small well into their life cycle…And when you ask them if they want to be big over time, they say no. That's not their ambition. This is important because a lot of our models assume businesses want to grow”
  • Debraj Ray and Arthur Robson propose randomizing the order of co-authors, noting that “Debraj had just been enthusiastically recommended a “wonderful paper” by Banerjee et al, on which he was a co-author” and beyond blogging about it, even have written a theory paper on the idea.
  • The Monkey Cage has a great set of Q&As with authors, special issue editors, a reviewer, and the journal editors of their experiences when the journal Comparative Political Studies published a pilot “results-free review” special issue in which authors submitted manuscripts without showing the results. I found this point from the reviewer useful “It is worth noting that we already do a lot of results-free reviewing. Anyone assessing grant proposals or sitting on a committee giving out fellowship money must take a stand on which research sounds more or less promising without knowing exactly what the results of the research will be. In advising students, we similarly must react to their initial ideas for dissertations or theses without knowing the results.”….but also found interesting that the journal’s editors were the most pessimistic of the lot about the process, discussing the costs of the process and noting that they are unlikely to repeat the results-free approach to reviewing and publishing.
  • Time to throw out the HDI and other “mash-up indices”? Jones and Klenow have a nice paper in the AER showing how to aggregate consumption, life expectancy, leisure and inequality into an overall welfare metric – Western Europe looks much closer to the US on this metric than GDP, while East Asian tigers and developing countries look further away – and countries like South Africa and Botswana have welfare levels less than 5% of those in the U.S.

August Occasional Links 3: poverty mapping redux, hassles vs prices, the poor and banks, and more…

David McKenzie's picture
  • A new paper in Science combines machine learning, nightlights, high-resolution daytime satellite images, and household surveys to map poverty in Africa. Marshall Burke (one of the authors) summarizes in this blog post: “First, we use lower-resolution nightlights images to train a deep learning model to identify features in the higher-resolution daytime imagery are predictive of economic activity. The idea here … is that nightlights are a good but imperfect measure of economic activity, and they are available for everywhere on earth. So the nightlights help the model figure out what features in the daytime imagery are predictive of economic activity.  Without being told what to look for, the model is able to identify a number of features in the daytime imagery that look like things we recognize and tend to think are important in economic activity (e.g roads, urban areas, farmland, and waterways…). Then in the last step of the process, we use these features in the daytime imagery to predict village-level wealth, as measured in a few household surveys that were publicly available and geo-referenced”. Over at the CGD blog, Justin Sandefur offers a nice commentary and critique.
  • Also in Science, Dupas, Hoffman, Kremer and Zwane compare the relative effectiveness of prices and hassle/time costs in screening health product delivery so that only those who will use them take them. They find requiring people to show up and redeem a monthly voucher reduces the amount of chlorine given away by 60%, but with only a 1% drop in usage
  • Jason Kerwin on work by Dupas, Robinson, Karlan and Ubfal on introducing savings accounts to the poor in three countries, finding very low take-up  - I like his summary “Unfortunately, like many other silver bullets before it, this one has failed to kill the stalking werewolf of poverty. Indeed, it almost doesn’t leave the barrel of the gun. 60% of the treatment group in Malawi and Uganda (and 94% in Chile) never touch the bank accounts.”
  • USAID has a post on my RFID technology flop, published in Development Engineering.

And finally, XKCD on linear regressions not to trust

August Occasional Links 2: Has IE peaked? Unusual seeding of random selection, unequal Egypt, and more…

David McKenzie's picture

August occasional links 1: gender, education accountability, conferences, and more…

David McKenzie's picture

Weekly links July 29: the political economy of running a RCT, the peer review trade-off, work with me, and more…

David McKenzie's picture
  • A couple of months ago I attended this very interesting conference by the Innovation Growth Lab run by Nesta. I was in a session with Mark Sayers from the UK’s Department for Business, Energy and Industrial Strategy, which has been running an RCT on growth vouchers for 20,000 firms in the UK. He gave a talk on lessons learned from a policy side in engaging in such a trial – and I found it very interesting to hear the political economy side (Treasury only agreed to release the funding for a program they were somewhat skeptical of if it would be evaluated by an RCT). A video of his short talk is now up.
  • Slate piece on how journalists should cover working papers (based on the recent Fryer paper on racial bias in the use of lethal force). h/t Berk, who is reminded of his classic post on working papers not working.

Making Disaster Relief More Like Funeral Societies: A Review of Dercon and Clarke’s Dull Disasters

David McKenzie's picture

I was recently at the Novafrica conference in Lisbon, where one of the keynote talks was given by Stefan Dercon. He based it around a newly released short book he has written with Daniel Clarke, called Dull Disasters (open access version). The title is meant to indicate both the aim to make dealing with disasters a dull event rather than media circus, as well as to discuss ways to ‘dull’ or reduce the impact of disasters.
Stefan started his talk by noting that disaster relief may well be the part of the whole international development and humanitarian system that is the least efficient and has had the least research on it. The book starts by noting the predictability of responses “every time a natural disaster hits any part of the world, the newspaper headlines ten days later can be written in advance: ‘why isn’t the response more coordinated?’. He gives the examples of the responses to the earthquakes in Nepal and Haiti, to Hurricane Katrina, and to Ebola as examples. But he then notes the crux of the problem “…The truth is everybody argues for coordination but nobody likes to be coordinated”.

Conditional on your parents, does your country matter for early childhood human capital? Surprisingly no!

David McKenzie's picture
There is a large literature that emphasizes the importance of investments made in early life for lifetime outcomes. Does growing up in a poor, conflict-afflicted country have a negative impact? There are many reasons to think yes, including the disease environment, quality of medical facilities, availability of nutrition, quality of early-childhood education facilities etc.