World Bank Blogs
http://blogs.worldbank.org/planet.xml
IBRD and IDA: Working for a World Free of Poverty.enA Curated List of Our Postings on Technical Topics – Your One-Stop Shop for Methodology
http://blogs.worldbank.org/impactevaluations/curated-list-our-postings-technical-topics-your-one-stop-shop-methodology
Rather than the usual list of Friday links, this week I thought I’d follow up on <a href="http://blogs.worldbank.org/impactevaluations/introducing-ask-guido">our post by Guido Imbens</a> yesterday on clustering and post earlier this week by <a href="http://blogs.worldbank.org/impactevaluations/hawthorne-effect-what-do-we-really-learn-watching-teachers-and-others">Dave Evans on Hawthorne effects</a> with a curated list of our technical postings, to serve as a one-stop shop for your technical reading. I’ve focused here on our posts on methodological issues in impact evaluation – we also have a whole lot of posts on how to conduct surveys and measure certain concepts that I’ll leave for another time.<br />
<strong>Random Assignment, Registration and Reporting</strong><br />
<a href="http://blogs.worldbank.org/impactevaluations/tools-of-the-trade-doing-stratified-randomization-with-uneven-numbers-in-some-strata">Doing stratified randomization with uneven numbers in the Strata</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/how-randomize-using-many-baseline-variables-guest-post-thomas-barrios">How to randomize using many baseline variables</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/public-randomization-ceremonies">Public randomization ceremonies</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/designing-experiments-to-measure-spillover-effects">Designing experiments to measure spillover effects</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/what-are-mechanism-experiments-and-should-we-be-doing-more-of-them">Mechanism experiments</a> and <a href="http://blogs.worldbank.org/impactevaluations/inside-the-black-box-why-do-things-work-0">opening up the black box</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/sampling-weights-matter-for-rct-design">Sample weights and RCT design</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/a-pre-analysis-plan-checklist">A pre-analysis plan check-list</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/trying-out-new-trial-registries">The New Trial Registries</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/what-isn-t-reported-impact-evaluations">What isn’t reported in impact evaluations but maybe should be</a><br />
<strong>Propensity Score Matching</strong><br />
Guido Imbens on <a href="https://blogs.worldbank.org/impactevaluations/introducing-ask-guido">clustering standard errors with matching</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/tools-trade-recent-tests-matching-estimators-through-evaluation-job-training-programs">Testing different matching estimators as applied to job training programs</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/tools-of-the-trade-the-covariate-balanced-propensity-score">The covariate balanced propensity score</a><br />
<strong>Difference-in-Differences</strong><br />
<a href="http://blogs.worldbank.org/impactevaluations/often-unspoken-assumptions-behind-difference-difference-estimator-practice">The often unspoken assumptions behind diff-in-diff</a><br />
<strong>Other Evaluation Methods</strong><br />
The <a href="https://blogs.worldbank.org/impactevaluations/evaluating-regulatory-reforms-using-the-synthetic-control-method">synthetic control method</a>, as applied to regulatory reforms<br />
<a href="http://blogs.worldbank.org/impactevaluations/guest-post-by-alan-de-brauw-regression-discontinuity-impacts-with-an-implicit-index-evaluating-el-sa">Regression discontinuity with an implicit index</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/using-spatial-variation-program-performance-identify-causal-impact-0">Using spatial variation</a> in program performance to identify impacts<br />
<a href="http://blogs.worldbank.org/impactevaluations/guest-post-by-howard-white-can-we-do-small-n-impact-evaluations">Small n impact evaluation methods</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/can-we-trust-shoestring-evaluations">Can we trust shoestring evaluations?</a><br />
<strong>Analysis</strong><br />
Regression adjustment in randomized experiments (<a href="http://blogs.worldbank.org/impactevaluations/regression-adjustment-in-randomized-experiments-is-the-cure-really-worse-than-the-disease">part one</a>, <a href="http://blogs.worldbank.org/impactevaluations/guest-post-by-winston-lin-regression-adjustment-in-randomized-experiments-is-the-cure-really-worse-0">part two</a>)<br />
<a href="http://blogs.worldbank.org/impactevaluations/tools-of-the-trade-when-to-use-those-sample-weights">When to use survey weights</a> in analysis<br />
<a href="http://blogs.worldbank.org/impactevaluations/tools-of-the-trade-a-quick-adjustment-for-multiple-hypothesis-testing">Adjustments for multiple hypothesis testing</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/help-for-attrition-is-just-a-phone-call-away-a-new-bounding-approach-to-help-deal-with-non-response">Bounding approaches to deal with attrition</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/whether-to-probit-or-to-probe-it-in-defense-of-the-linear-probability-model">Linear probability models versus probits</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/tools-of-the-trade-dealing-with-multiple-lotteries">Dealing with multiple lotteries</a><br />
Estimating standard errors with small clusters (<a href="http://blogs.worldbank.org/impactevaluations/annals-of-good-ie-practice-getting-those-standard-errors-correct-in-small-sample-clustered-studies">part one</a>, <a href="http://blogs.worldbank.org/impactevaluations/tools-of-the-trade-estimating-correct-standard-errors-in-small-sample-cluster-studies-another-take">part two</a>)<br />
<a href="http://blogs.worldbank.org/impactevaluations/tools-of-the-trade-beyond-mean-decompositions-with-an-application-to-the-gender-wage-gap-in-china">Decomposition methods</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/you-think-randomized-controlled-trials-are-great-actually-they-are-even-better-than-that-guest-post">Estimation of treatment effects with incomplete compliance</a><br />
<strong>Power Calculations and Improving Power</strong><br />
<a href="http://blogs.worldbank.org/impactevaluations/does-the-intra-class-correlation-matter-for-power-calculations-if-i-am-going-to-cluster-my-standard">Does the intra-cluster correlation matter for power calculations if I am going to cluster my standard errors?</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/power-calculations-for-propensity-score-matching">Power calculations for propensity score matching</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/power-calculations-101-dealing-with-incomplete-take-up">Power calculations 101: dealing with incomplete take-up</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/collecting-more-rounds-of-data-to-boost-power-the-new-stuff">Collecting more rounds of data to boost power</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/on-improving-power-in-small-sample-studies">Improving power in small samples</a><br />
<strong>On External Validity</strong><br />
<a href="http://blogs.worldbank.org/impactevaluations/weighting-for-external-validity-then-waiting-for-election-results">Weighting for external validity</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/will-successful-intervention-over-there-get-results-over-here-we-can-never-answer-full-certainty-few">Will that successful intervention over there get results here?</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/learn-live-without-external-validity">Learn to live without external validity</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/questioning-external-validity-regression-estimates-why-they-can-be-less-representative-you-think">Why the external validity of regression estimates can be less than you think</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/why-similarity-wrong-concept-external-validity">Why similarity is the wrong concept for external validity</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/a-rant-on-the-external-validity-double-double-standard">A rant on the external validity double standard</a><br />
<strong>Jargony Terms in Impact Evaluations</strong><br />
<a href="http://blogs.worldbank.org/impactevaluations/hawthorne-effect-what-do-we-really-learn-watching-teachers-and-others">The Hawthorne Effect</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/are-john-henry-effects-as-apocryphal-as-their-eponym">The John Henry Effect</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/is-it-the-program-or-is-it-participation-randomization-and-placebos">Placebo effects</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/how-should-we-understand-clinical-equipoise-when-doing-rcts-development">Clinical Equipoise</a><br />
<strong>Stata Tricks</strong><br />
<a href="http://blogs.worldbank.org/impactevaluations/tools-trade-graphing-impacts-standard-error-bars">Graphing impacts with Standard Error Bars</a><br />
<a href="http://blogs.worldbank.org/impactevaluations/tools-of-the-trade-intra-cluster-correlations">Calculating the intra-cluster correlation</a><br />
Fri, 21 Feb 2014 07:46:26 -0500David McKenzieThe often (unspoken) assumptions behind the difference-in-difference estimator in practice
http://blogs.worldbank.org/impactevaluations/often-unspoken-assumptions-behind-difference-difference-estimator-practice
This post is co-written with <a href="http://www.eco.uc3m.es/~ricmora/" rel="nofollow">Ricardo Mora</a> and <a href="http://www.eco.uc3m.es/~ireggio/" rel="nofollow">Iliana Reggio</a><br />
<br />
The difference-in-difference (DID) evaluation method should be very familiar to our readers – a method that infers program impact by comparing the pre- to post-intervention change in the outcome of interest for the treated group relative to a comparison group. The key assumption here is what is known as the “Parallel Paths” assumption, which posits that the average change in the comparison group represents the counterfactual change in the treatment group if there were no treatment. It is a popular method in part because the data requirements are not particularly onerous – it requires data from only two points in time – and the results are robust to any possible confounder as long as it doesn’t violate the Parallel Paths assumption. When data on several pre-treatment periods exist, researchers like to check the Parallel Paths assumption by testing for differences in the pre-treatment trends of the treatment and comparison groups. Equality of pre-treatment trends may lend confidence but this can’t directly test the identifying assumption; by construction that is untestable. Researchers also tend to explicitly model the “natural dynamics” of the outcome variable by including flexible time dummies for the control group and a parametric time trend differential between the control and the treated in the estimating specification.<br />
<br />
Typically, the applied researcher’s practice of DID ends at this point. Yet <a href="http://e-archivo.uc3m.es/handle/10016/16065" rel="nofollow">a very recent working paper</a> by Ricardo Mora and Iliana Reggio (two co-authors of this post) points out that DID-as-commonly-practiced implicitly involves other assumptions instead of Parallel Paths, assumptions perhaps unknown to the researcher, which may influence the estimate of the treatment effect. These assumptions concern the dynamics of the outcome of interest, both before and after the introduction of treatment, and the implications of the particular dynamic specification for the Parallel Paths assumption.<br />
<!--break--> <br />
As stated, researchers often supplement the DID specification with a time trend of some parametric form such as a (perhaps group specific) linear trend. But by including this linear trend, the identifying assumption shifts from the standard Parallel Paths to what can be termed Parallel Growths, since now deviation from a trend line identifies impact (alternatively, we can think of Parallel Growths as a Parallel Path assumption in first differences).<br />
<br />
The switch from Parallel Paths to Parallel Growths highlights a line of reasoning that Ricardo and Iliana formally extend to a general family of Parallel Assumptions valid for higher order differencing such as a difference of double-differencing (what might be called a Parallel Accelerations assumption) and so on. Arguably higher order Parallel Assumptions present weaker identifying assumptions than Parallel Paths – we no longer need the trend in the comparison group to proxy for the counterfactual trend of the treatment group but rather the <em>growth</em> (i.e. the change in trend) in the comparison group to proxy for the counterfactual <em>growth</em>. But there is a trade-off in our empirical practice since differencing of data tends to exacerbate any measurement error present in the outcome measures. So the extent that we can benefit from higher order Parallel Assumptions is determined by our data on a case by case basis.<br />
<br />
Ricardo and Iliana then develop a general additive regression model with fully flexible dynamics – this has the advantage of being able to test for possible restrictions on the dynamics rather than simply positing a particular parametric form. The model also doesn’t impose equivalence between alternative parallel assumptions. In fact this model can test for such equivalence:<br />
<br />
<img alt="" src="https://blogs.worldbank.org/impactevaluations/files/impactevaluations/Equation_21Nov2013.PNG" style="height:50px; width:250px" /><br />
<br />
The framework above allows for fully flexible pre-treatment trend differentials between the treated and comparison group and also allows for a comparison of any two consecutive parallel assumptions such as Paths vs. Growths. Here <em>Y</em> is the outcome of interest and time runs from<em> t1</em> until <em>T</em> with the intervention beginning at some point between <em>t2</em> and <em>T</em>. The binary indicator variable <em>I </em>designates time-periods while <em>D</em> indicates treated units. In practice, researchers often estimate a more restrictive equation than this one – even when the data permit this more flexible model. Here is <a href="http://ideas.repec.org/a/uwp/jhriss/v40y2005i2p559-590.html" rel="nofollow">one paper that does use this specification</a> to look at the effects of school-desegregation in the U.S.<br />
<br />
Ricardo and Iliana then review all DiD papers published in ten well-known economic journals over the past three years and focus on those that (a) adopt a DiD framework with more than one pre-treatment time period and (b) have made the data publically available. There are nine papers that meet these criteria. The topics of study in these papers range from the effect of Daylight Savings Time on US residential electricity use to the effects of WWI related male mortality on marriage market outcomes in France. All of the nine papers adopt more restrictive estimating equations than the one above. In fact most of the 13 specifications in the nine papers restrict pre-treatment dynamics to be equivalent between treatment and comparison groups. Most also impose a constant treatment effect in post-treatment periods thus ignoring the possible dynamics of treatment.<br />
<br />
Eleven of the 13 specifications report significant treatment effects in the original papers. In contrast by applying the flexible model to the data Ricardo and Iliana find:<br />
<ul>
<li>
In the 11 cases that estimate significant impacts, once re-estimated with the fully flexible model and with an explicit Parallel Paths assumption, only 5 remain precisely estimated and many of the 11 have substantively different point estimates.</li>
<li>
With the Parallel Growths assumption this number falls to 3 of 11 cases.</li>
<li>
Tests for the constancy of post-treatment effects for 11 of the specifications wind up rejecting the absence of dynamic effects in 6 of the instances. It seems post-treatment dynamic effects often matter and ideally should be modeled in a more flexible manner.</li>
<li>
A test of the equivalence of Parallel Paths and Parallel Growth assumptions rejects equivalence in 5 out of the 13 specifications. In these cases the arguably weaker assumption of Parallel Growth results in significantly different findings than Parallel Paths.</li>
</ul>
<br />
Now it’s true that standard errors are higher in general with the fully-flexible model (especially with the Parallel Growths assumption tested with first-differenced data) and in many cases equality between the treatment effect reported in the published paper and the estimate under the flexible model cannot be rejected. As Ricardo and Iliana conclude, “with the fully flexible model we obtain results that coincide in sign and significance level with the original results in approximately one third of the cases. We interpret this outcome as suggesting that for many empirical applications, the models used are unduly restrictive.”<br />
<br />
Here is a call to think twice about our DiD specifications. Data permitting, the more flexible proposed model above can serve as a benchmark at the start of any DiD analysis to test the robustness of alternative Parallel Assumptions and alternative dynamic specifications. At the very least this exercise may serve to guide more informed parsimonious models.<br />
<br />
p.s. – Ricardo and Iliana are currently writing an ado file that would implement many of these tests on parallel assumption equivalence or dynamics. We’ll post a link when it is ready for sharing.<br />
Thu, 21 Nov 2013 07:41:00 -0500Jed Friedman