Tools of the Trade
http://blogs.worldbank.org/impactevaluations/taxonomy/term/3844/all
enYour go-to regression specification is biased: here’s the simple way to fix it
http://blogs.worldbank.org/impactevaluations/your-go-regression-specification-biased-here-s-simple-way-fix-it
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"><p>
Today, I am writing about something many of you already know. You’ve probably been hearing about it for 5-10 years. But, you still ignore it. Well, now that the evidence against it has mounted enough and the fix is simple enough, I am here to urge you to tweak your regression specifications in your program evaluations.</p>
</div></div></div>Mon, 12 Feb 2018 12:06:00 +0000Berk Ozler1639 at http://blogs.worldbank.org/impactevaluationsIE analytics: introducing ietoolkit
http://blogs.worldbank.org/impactevaluations/ie-analytics-introducing-ietoolkit
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even">Scientific advances are the result of a long, cumulative process of building knowledge and methodologies -- or, as the cliché goes, “standing on the shoulders of giants”. One often overlooked, but crucial part of this climb is a long tradition of standardization of everything from mathematical notation and scientific terminology, to format for academic articles and references.<br />
<br /></div></div></div>Wed, 15 Nov 2017 12:35:00 +0000Luiza Andrade1601 at http://blogs.worldbank.org/impactevaluationsU.S. Law and Order Edition: Indoor prostitution and police body-worn cameras
http://blogs.worldbank.org/impactevaluations/us-law-and-order-edition-indoor-prostitution-and-police-body-worn-cameras
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even">Today, I cover two papers from two ends of the long publication spectrum – a paper that is forthcoming in the Review of Economic Studies on the effect of decriminalizing indoor prostitution on rape and sexually transmitted infections (STIs); and another working paper that came out a few days ago on the effect of police wearing cameras on use of force and civilian complaints. While these papers are from the U.S.A, each of them has something to teach us about methods and policies in development economics. I devote space to each paper proportional to the time it has been around…<br /><br /></div></div></div>Wed, 25 Oct 2017 10:58:00 +0000Berk Ozler1591 at http://blogs.worldbank.org/impactevaluationsWhen should you cluster standard errors? New wisdom from the econometrics oracle
http://blogs.worldbank.org/impactevaluations/when-should-you-cluster-standard-errors-new-wisdom-econometrics-oracle
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"><p>
In ancient Greek times, important decisions were never made without consulting the high priestess at the Oracle of Delphi. She would deliver wisdom from the gods, although this advice was sometimes vague or confusing, and was often misinterpreted by mortals. Today I bring word that the high priestess and priests (Athey, Abadie, Imbens and Wooldridge) have <a href="https://arxiv.org/abs/1710.02926" rel="nofollow">delivered</a> new wisdom from the god of econometrics on the important decision of <strong>when should you cluster standard errors</strong>. This is definitely one of life’s most important questions, as any keen player of seminar bingo can surely attest. In case their paper is all greek to you (half of it literally is), I will attempt to summarize their recommendations, so that your standard errors may be heavenly.</p>
</div></div></div>Mon, 16 Oct 2017 11:12:00 +0000David McKenzie1588 at http://blogs.worldbank.org/impactevaluationsFinally, a way to do easy randomization inference in Stata!
http://blogs.worldbank.org/impactevaluations/finally-way-do-easy-randomization-inference-stata
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"><p>
Randomization inference has been increasingly recommended as a way of analyzing data from randomized experiments, especially in samples with a small number of observations, with clustered randomization, or with high leverage (see for example Alwyn Young’s <a href="http://personal.lse.ac.uk/YoungA/ChannellingFisher.pdf" rel="nofollow">paper</a>, and the books by <a href="http://blogs.worldbank.org/impactevaluations/review-imbens-and-rubin-causal-inference-book" rel="nofollow">Imbens and Rubin</a>, and <a href="http://blogs.worldbank.org/impactevaluations/gerber-and-green-s-new-textbook-on-field-experiments-should-you-read-it-and-what-should-they-add-for" rel="nofollow">Gerber and Green</a>). However, one of the barriers to widespread usage in development economics has been that, to date, no simple commands for implementing this in Stata have been available, requiring authors to program from scratch.<br /><br />
This has now changed with a new command <strong>ritest </strong>written by Simon Hess, a PhD student who I met just over a week ago at Goethe University in Frankfurt. This command is extremely simple to use, so I thought I would introduce it and share some tips after playing around with it a little. The <a href="http://www.stata-journal.com/article.html?article=st0489" rel="nofollow">Stata journal article</a> is also now out.<br /><br /><strong>How do I get this command?</strong><br />
Simply type <strong>findit ritest</strong> in Stata.<br />
[<strong>edit</strong>: that will get the version from the Stata journal. However, to get the most recent version with a couple of bug fixes noted below, type</p>
<p>
<span>net describe ritest, from(<a href="https://raw.githubusercontent.com/simonheb/ritest/master/" rel="nofollow">https://raw.githubusercontent.com/simonheb/ritest/master/</a>)</span></p>
</div></div></div>Mon, 02 Oct 2017 13:32:00 +0000David McKenzie1582 at http://blogs.worldbank.org/impactevaluationsDealing with attrition in field experiments
http://blogs.worldbank.org/impactevaluations/dealing-attrition-field-experiments
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"><p>
Here is a familiar scenario for those running field experiments: You’re conducting a study with a treatment and a comparison arm and measuring your main outcomes with surveys and/or biomarker data collection, meaning that you need to contact the subjects (unlike, say, using administrative data tied to their national identity numbers) – preferably in person. You know that you will, inevitably, lose some subjects from both groups to follow-up: they will have moved, be temporarily away, refuse to answer, died, etc. In some of these cases there is nothing more you can do, but in others you can try harder: you can wait for them to come back and revisit; you can try to track them to their new location, etc. You can do this at different intensities (try really hard or not so much), different boundaries (for everyone in the study district, region, or country, but not for those farther away), and different samples (for everyone or for a random sub-sample).<br /><br /><strong><em>Question</em></strong>: suppose that you decide that you have the budget to do everything you can to find those not interviewed during the first pass through the study areas (doesn’t matter if you have enough budget for a randomly chosen sub-sample or everyone), i.e. an intense tracking exercise to reduce the rate of attrition. In addition to everything else you can do to track subjects from both groups, you have a tool that you can use for those only in the treatment arm (say, your treatment was group-based therapy for teen mums and you think that the mentors for these groups may have key contact information for subjects who moved in the treatment group. There were no placebo groups in control, i.e. no counterpart mentors). <strong><em>Do you use this source to track subjects – even if it is only available for the treatment group?</em></strong></p>
</div></div></div>Sun, 24 Sep 2017 18:48:00 +0000Berk Ozler1579 at http://blogs.worldbank.org/impactevaluationsTrouble with pre-analysis plans? Try these three weird tricks.
http://blogs.worldbank.org/impactevaluations/trouble-pre-analysis-plans-try-these-three-weird-tricks
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even">Pre-analysis plans increase the chances that published results are true by restricting researchers’ ability to data-mine. Unfortunately, writing a pre-analysis plan isn’t easy, nor is it without costs, as discussed in recent work by <a href="http://economics.mit.edu/files/10654" rel="nofollow">Olken</a> and <a href="http://web.stanford.edu/~niederle/Coffman.Niederle.PAP.JEP.pdf" rel="nofollow">Coffman and Niederle</a>. Two recent working papers - “<a href="http://www.nber.org/papers/w23544" rel="nofollow">Split-Sample Strategies for Avoiding False Discoveries</a>,” by Michael L.</div></div></div>Wed, 12 Jul 2017 12:37:00 +0000Owen Ozier1562 at http://blogs.worldbank.org/impactevaluationsList Experiments for Sensitive Questions – a Methods Bleg
http://blogs.worldbank.org/impactevaluations/list-experiments-sensitive-questions-methods-bleg
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"><p>
About a year ago, I wrote <a href="http://blogs.worldbank.org/impactevaluations/issues-data-collection-and-measurement" rel="nofollow">a blog post</a> on issues surrounding data collection and measurement. In it, I talked about “list experiments” for sensitive questions, about which I was not sold at the time. However, now that I have a bunch of studies going to the field at different stages of data collection, many of which are about sensitive topics in adolescent female target populations, I am paying closer attention to them. In my reading and thinking about the topic and how to implement it in our surveys, I came up with a bunch of questions surrounding the optimal implementation of these methods. In addition, there is probably more to be learned on these methods to improve them further, opening up the possibility of experimenting with them when we can. Below are a bunch of things that I am thinking about and, as we still have some time before our data collection tools are finalized, you, our readers, have a chance to help shape them with your comments and feedback.</p>
</div></div></div>Mon, 08 May 2017 13:39:00 +0000Berk Ozler1539 at http://blogs.worldbank.org/impactevaluationsPower Calculations for Regression Discontinuity Evaluations: Part 3
http://blogs.worldbank.org/impactevaluations/power-calculations-regression-discontinuity-evaluations-part-3
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even">This is my third, and final, in a series of posts on doing power calculations for regression discontinuity (see <a href="http://blogs.worldbank.org/impactevaluations/power-calculations-regression-discontinuity-evaluations-part-1" rel="nofollow">part 1</a> and <a href="http://blogs.worldbank.org/impactevaluations/power-calculations-regression-discontinuity-evaluations-part-2" rel="nofollow">part 2</a>).<br /><strong><em>Scenario 3 (SCORE DATA AVAILABLE, AT LEAST PRELIMINARY OUTCOME DATA AVAILABLE; OR SIMULATED DATA USED): </em></strong><em>The context of data being available seems less usual to me in the planning stages of an impact evaluation, but could be possible in some settings (e.g. you have the score data and administrative data on a few outcomes, and then are deciding whether to collect survey data on other outcomes). But more generally, you will be in this stage once you have collected all your data. Moreover, the methods discussed here can be used with simulated data in cases where you don’t have data.</em><br /><br />
There is then a new Stata package <a href="https://sites.google.com/site/rdpackages/rdpower" rel="nofollow"><em>rdpower</em></a> written by Matias Cattaneo and co-authors that can be really helpful in this scenario (thanks also to him for answering several questions I had on its use). It calculates power and sample sizes, assuming you are then going to be using the <em>rdrobust</em> command to analyze the data. There are two related commands here:
<ul><li>
<strong>rdpower: </strong>this calculates the power, given your data and sample size for a range of different effect sizes</li>
<li>
<strong>rdsampsi: </strong>this calculates the sample size you need to get a given power, given your data and that you will be analyzing it with rdrobust.</li>
</ul></div></div></div>Mon, 12 Sep 2016 12:54:00 +0000David McKenzie1433 at http://blogs.worldbank.org/impactevaluationsPower Calculations for Regression Discontinuity Evaluations: Part 2
http://blogs.worldbank.org/impactevaluations/power-calculations-regression-discontinuity-evaluations-part-2
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"><p>
<a href="http://blogs.worldbank.org/impactevaluations/power-calculations-regression-discontinuity-evaluations-part-1" rel="nofollow">Part 1</a> covered the case where you have no data. Today’s post considers another common setting where you might need to do RD power calculations.<br /><strong><em>Scenario 2 (SCORE DATA AVAILABLE, NO OUTCOME DATA AVAILABLE): </em></strong><em>the context here is that assignment to treatment has already occurred via a scoring threshold rule, and you are deciding whether to try and collect follow-up data. For example, referees may have given scores for grant applications, and proposals with scores above a certain level got funded, and now you are deciding whether to collect outcomes several years later to see whether the grants had impacts; or kids may have sat a test to get into a gifted and talented program, and now you want to see whether to collect to data on how these kids have done in the labor market.</em><br /><br />
Here you have the score data, so don’t need to make assumptions about the correlation between treatment assignment and the score, but can use the actual correlation in your data. However, since the optimal bandwidth will differ for each outcome examined, and you don’t have the outcome data, you don’t know what the optimal bandwidth will be.<br />
In this context you can use the design effect discussed in <a href="http://blogs.worldbank.org/impactevaluations/power-calculations-regression-discontinuity-evaluations-part-1" rel="nofollow">my first blog post</a> with the actual correlation. You can then check with the full sample to see if you would have sufficient power if you surveyed everyone, and make an adjustment for choosing an optimal bandwidth within this sample using an additional multiple of the design effect as discussed previously. Or you can simulate outcomes and use the simulated outcomes along with the actual score data (see next post).</p>
</div></div></div>Thu, 08 Sep 2016 12:06:00 +0000David McKenzie1431 at http://blogs.worldbank.org/impactevaluationsPower Calculations for Regression Discontinuity Evaluations: Part 1
http://blogs.worldbank.org/impactevaluations/power-calculations-regression-discontinuity-evaluations-part-1
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"><p>
I haven’t done a lot of RD evaluations before, but recently have been involved in two studies which use regression discontinuity designs. One issue which comes up is then how to do power calculations for these studies. I thought I’d share some of what I have learned, and if anyone has more experience or additional helpful content, please let me know in the comments. I thank, without implication, Matias Cattaneo for sharing a lot of helpful advice.<br /><br /><strong>One headline piece of information that I’ve learned is that RD designs have way less power than RCTs for a given sample, and I was surprised by how much larger the sample is that you need for an RD.</strong><br />
How to do power calculations will vary depending on the set-up and data availability. I’ll do three posts on this to cover different scenarios:<br /><br /><strong><em>Scenario 1 (NO DATA AVAILABLE): </em></strong><em>the context here is of a prospective RD study. For example, a project is considering scoring business plans, and those above a cutoff will get a grant; or a project will be targeting for poverty, and those below some poverty index measure will get the program; or a school test is being used, with those who pass the test then being able to proceed to some next stage. </em><br /><em>The key features here are that, since it is being planned in advance, you do not have data on either the score (running variable), or the outcome of interest. The objective of the power calculation is then to see what size sample you would need to have in the project and survey, and whether it is worth you going ahead with the study. Typically your goal here is to get some sense of order of magnitude – do I need 500 units or 5000?</em></p>
</div></div></div>Tue, 06 Sep 2016 12:09:00 +0000David McKenzie1430 at http://blogs.worldbank.org/impactevaluationsTools of the Trade: The Regression Kink Design
http://blogs.worldbank.org/impactevaluations/tools-trade-regression-kink-design
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"><p>
Regression Discontinuity designs have become a popular addition to the impact evaluation toolkit, and offer a <a href="http://blogs.worldbank.org/impactevaluations/regression-discontinuity-porn" rel="nofollow">visually appealing</a> way of demonstrating the impact of a program around a cutoff. An extension of this approach which is growing in usage is the <strong>regression kink design(RKD)</strong>. I’ve never estimated one of these, and am not an expert, but thought it might be useful to try to provide an introduction to this approach along with some links that people can then follow-up on if they want to implement it.</p>
</div></div></div>Mon, 08 Feb 2016 14:22:00 +0000David McKenzie1358 at http://blogs.worldbank.org/impactevaluationsFrom my mailbox: should I work with only a subsample of my control group if I have big take-up problems?
http://blogs.worldbank.org/impactevaluations/my-mailbox-should-i-work-only-subsample-my-control-group-if-i-have-big-take-problems
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even">Over the past month I’ve received several versions of the same question, so thought it might be useful to post about it.<br />
<br />
Here’s one version:<br /><em>I have a question about an experiment in which we had a very big problem getting the individuals in the treatment group to take-up the treatment. Therefore we now have a treatment much smaller than the control. For efficiency reasons does it still make sense to survey all the control group, or should we take a random draw in order to have an equal number of treated and control?</em><br />
<br />
And another version</div></div></div>Mon, 20 Jul 2015 13:57:00 +0000David McKenzie1287 at http://blogs.worldbank.org/impactevaluationsAllocating Treatment and Control with Multiple Applications per Applicant and Ranked Choices
http://blogs.worldbank.org/impactevaluations/allocating-treatment-and-control-multiple-applications-applicant-and-ranked-choices
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even">This came up in the context of work with Ganesh Seshan designing an evaluation for a computer training program for migrants. The training program was to be taught in one 3 hour class per week for several months. Classes were taught Sunday, Tuesday and Thursday evenings from 5-8 pm, and then there were four separate slots on Friday, the first day of the weekend. So in total there were 7 possible sessions people could potentially attend. However, most migrants would prefer to go on the weekend, and many would not be able to attend on particular days of the week.</div></div></div>Tue, 07 Jul 2015 04:54:00 +0000David McKenzie1283 at http://blogs.worldbank.org/impactevaluationsEndogenous stratification: the surprisingly easy way to bias your heterogeneous treatment effect results and what you should do instead
http://blogs.worldbank.org/impactevaluations/endogenous-stratification-surprisingly-easy-way-bias-your-heterogeneous-treatment-effect-results-and
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"><p>
A common question of interest in evaluations is “which groups does the treatment work for best?” A standard way to address this is to look at heterogeneity in treatment effects with respect to baseline characteristics. However, there are often many such possible baseline characteristics to look at, and really the heterogeneity of interest may be with respect to outcomes in the absence of treatment. Consider two examples:<br />
A: A vocational training program for the unemployed: we might want to know if the treatment helps more those who were likely to stay unemployed in the absence of an intervention compared to those who would have been likely to find a job anyway.<br />
B: Smaller class sizes: we might want to know if the treatment helps more those students whose test scores would have been low in the absence of smaller classes, compared to those students who were likely to get high test scores anyway.<br /></div></div></div>Mon, 16 Mar 2015 13:20:00 +0000David McKenzie1239 at http://blogs.worldbank.org/impactevaluations