Syndicate content

David McKenzie's blog

Weekly links October 14: fake doctors and dubious health claims, real profits, refugees, reweighting, and more…

David McKenzie's picture
  • In Science this week: what refugees do Europeans want? A “conjoint experiment” with 180,000 Europeans finds they want high-skilled, young, fluent in the local language, who are persecuted and non-Muslim (5 page paper, 121 page appendix!). This involved showing pairs of refugees with randomly assigned characteristics and having them say whether they supported admitting the refugee, and if they could only choose one out of the pair, which one.
  • BBC News covers the recent science paper by Jishnu Das and co-authors on training ‘fake doctors’ in India (or for more study details, see the MIT press release which has a great photo-bomb)

Book Review: Failing in the Field – Karlan and Appel on what we can learn from things going wrong

David McKenzie's picture

Dean Karlan and Jacob Appel have a new book out called Failing in the Field: What we can learn when field research goes wrong. It is intended to highlight research failures and what we can learn from them, sharing stories that otherwise might otherwise be told only over a drink at the end of a conference, if at all. It draws on a number of Dean’s own studies, as well as those of several other researchers who have shared stories and lessons. The book is a good short read (I finished it in an hour), and definitely worth the time for anyone involved in collecting field data or running an experiment.

Weekly links September 30: re-analysis and respective criticism, working with NGOs, off to the big city, and more…

David McKenzie's picture
  • Solomon Hsiang and Nitin Sekar respond to the guest post by Quy-Toan Do and co-authors which had re-analyzed their data to question whether a one-time legal sale of ivory had increased elephant poaching. They state “Their claims are based on a large number of statistical, coding, and inferential errors.  When we correct their analysis, we find that our original results hold for sites that report a large number of total carcasses; and the possibility that our findings are artifacts of the data-generating process that DLM propose is extremely rare under any plausible set of assumptions”.
    • We screwed up by hosting this guest post without checking that Do and co-authors had shared it with the original co-authors and had given them a chance to respond.
    • We do believe that blogs have an important role to play in discussing research (see also Andrew Gelman on this), but think Uri Simonsohn’s piece this week on how to civilly argue with someone else’s analysis has good practice ideas for both social media and refereeing – with sharing the discussion with authors beforehand when re-analysis is done being good practice. We will try to adhere to this better in the future.
    • We are waiting to see whether Do and co-authors have any further word, and plan on posting only one more summary on this after making sure both sides have iterated. We plan to avoid Elephant wars since worm wars were enough.
  • In somewhat related news, Dana Carney shows how to gracefully accept and respond to criticism over your earlier work.

Weekly links September 23: yay for airlines x2, dig out those old audit studies, how to study better, and more…

David McKenzie's picture
  • The second edition of the book Impact Evaluation in Practice by Paul Gertler, Sebastian Martinez, Patrick Premand, Laura Rawlings and Christel Vermeersch is now available. For free online! “The updated version covers the newest techniques for evaluating programs and includes state-of-the-art implementation advice, as well as an expanded set of examples and case studies that draw on recent development challenges. It also includes new material on research ethics and partnerships to conduct impact evaluation.”
  • Interesting Priceonomics piece on R.A. Fisher, and how he fought against the idea that smoking causes cancer
  • Oxfam blog post on power calculations for propensity score matching
  • The importance of airlines for research and growth:

Weekly links September 16: infrastructure myths, surveying rare populations x 2, being a development mum, and more…

David McKenzie's picture

Power Calculations for Regression Discontinuity Evaluations: Part 3

David McKenzie's picture
This is my third, and final, in a series of posts on doing power calculations for regression discontinuity (see part 1 and part 2).
Scenario 3 (SCORE DATA AVAILABLE, AT LEAST PRELIMINARY OUTCOME DATA AVAILABLE; OR SIMULATED DATA USED): The context of data being available seems less usual to me in the planning stages of an impact evaluation, but could be possible in some settings (e.g. you have the score data and administrative data on a few outcomes, and then are deciding whether to collect survey data on other outcomes). But more generally, you will be in this stage once you have collected all your data. Moreover, the methods discussed here can be used with simulated data in cases where you don’t have data.

There is then a new Stata package rdpower written by Matias Cattaneo and co-authors that can be really helpful in this scenario (thanks also to him for answering several questions I had on its use). It calculates power and sample sizes, assuming you are then going to be using the rdrobust command to analyze the data. There are two related commands here:
  • rdpower: this calculates the power, given your data and sample size for a range of different effect sizes
  • rdsampsi: this calculates the sample size you need to get a given power, given your data and that you will be analyzing it with rdrobust.

Weekly links September 9: no to cash? Machine learning for economists, climate economics, stupid regulations unchanged and more…

David McKenzie's picture

Power Calculations for Regression Discontinuity Evaluations: Part 2

David McKenzie's picture

Part 1 covered the case where you have no data. Today’s post considers another common setting where you might need to do RD power calculations.
Scenario 2 (SCORE DATA AVAILABLE, NO OUTCOME DATA AVAILABLE): the context here is that assignment to treatment has already occurred via a scoring threshold rule, and you are deciding whether to try and collect follow-up data. For example, referees may have given scores for grant applications, and proposals with scores above a certain level got funded, and now you are deciding whether to collect outcomes several years later to see whether the grants had impacts; or kids may have sat a test to get into a gifted and talented program, and now you want to see whether to collect to data on how these kids have done in the labor market.

Here you have the score data, so don’t need to make assumptions about the correlation between treatment assignment and the score, but can use the actual correlation in your data. However, since the optimal bandwidth will differ for each outcome examined, and you don’t have the outcome data, you don’t know what the optimal bandwidth will be.
In this context you can use the design effect discussed in my first blog post with the actual correlation. You can then check with the full sample to see if you would have sufficient power if you surveyed everyone, and make an adjustment for choosing an optimal bandwidth within this sample using an additional multiple of the design effect as discussed previously. Or you can simulate outcomes and use the simulated outcomes along with the actual score data (see next post).

Power Calculations for Regression Discontinuity Evaluations: Part 1

David McKenzie's picture

I haven’t done a lot of RD evaluations before, but recently have been involved in two studies which use regression discontinuity designs. One issue which comes up is then how to do power calculations for these studies. I thought I’d share some of what I have learned, and if anyone has more experience or additional helpful content, please let me know in the comments. I thank, without implication, Matias Cattaneo for sharing a lot of helpful advice.

One headline piece of information that I’ve learned is that RD designs have way less power than RCTs for a given sample, and I was surprised by how much larger the sample is that you need for an RD.
How to do power calculations will vary depending on the set-up and data availability. I’ll do three posts on this to cover different scenarios:

Scenario 1 (NO DATA AVAILABLE):  the context here is of a prospective RD study. For example, a project is considering scoring business plans, and those above a cutoff will get a grant; or a project will be targeting for poverty, and those below some poverty index measure will get the program; or a school test is being used, with those who pass the test then being able to proceed to some next stage.
The key features here are that, since it is being planned in advance, you do not have data on either the score (running variable), or the outcome of interest. The objective of the power calculation is then to see what size sample you would need to have in the project and survey, and whether it is worth you going ahead with the study. Typically your goal here is to get some sense of order of magnitude – do I need 500 units or 5000?

Weekly Links September 1: Entrepreneurship, Co-authoring, results-free review, and more…

David McKenzie's picture
  • Interview with Erik Hurst.  He discusses entrepreneurship “Most small businesses are plumbers and dry cleaners and local shopkeepers and house painters. These are great and important occupations, but empirically essentially none of them grow. They start small and stay small well into their life cycle…And when you ask them if they want to be big over time, they say no. That's not their ambition. This is important because a lot of our models assume businesses want to grow”
  • Debraj Ray and Arthur Robson propose randomizing the order of co-authors, noting that “Debraj had just been enthusiastically recommended a “wonderful paper” by Banerjee et al, on which he was a co-author” and beyond blogging about it, even have written a theory paper on the idea.
  • The Monkey Cage has a great set of Q&As with authors, special issue editors, a reviewer, and the journal editors of their experiences when the journal Comparative Political Studies published a pilot “results-free review” special issue in which authors submitted manuscripts without showing the results. I found this point from the reviewer useful “It is worth noting that we already do a lot of results-free reviewing. Anyone assessing grant proposals or sitting on a committee giving out fellowship money must take a stand on which research sounds more or less promising without knowing exactly what the results of the research will be. In advising students, we similarly must react to their initial ideas for dissertations or theses without knowing the results.”….but also found interesting that the journal’s editors were the most pessimistic of the lot about the process, discussing the costs of the process and noting that they are unlikely to repeat the results-free approach to reviewing and publishing.
  • Time to throw out the HDI and other “mash-up indices”? Jones and Klenow have a nice paper in the AER showing how to aggregate consumption, life expectancy, leisure and inequality into an overall welfare metric – Western Europe looks much closer to the US on this metric than GDP, while East Asian tigers and developing countries look further away – and countries like South Africa and Botswana have welfare levels less than 5% of those in the U.S.

Pages