This post is co-written with Ricardo Mora and Iliana Reggio
The difference-in-difference (DID) evaluation method should be very familiar to our readers – a method that infers program impact by comparing the pre- to post-intervention change in the outcome of interest for the treated group relative to a comparison group. The key assumption here is what is known as the “Parallel Paths” assumption, which posits that the average change in the comparison group represents the counterfactual change in the treatment group if there were no treatment. It is a popular method in part because the data requirements are not particularly onerous – it requires data from only two points in time – and the results are robust to any possible confounder as long as it doesn’t violate the Parallel Paths assumption. When data on several pre-treatment periods exist, researchers like to check the Parallel Paths assumption by testing for differences in the pre-treatment trends of the treatment and comparison groups. Equality of pre-treatment trends may lend confidence but this can’t directly test the identifying assumption; by construction that is untestable. Researchers also tend to explicitly model the “natural dynamics” of the outcome variable by including flexible time dummies for the control group and a parametric time trend differential between the control and the treated in the estimating specification.
Typically, the applied researcher’s practice of DID ends at this point. Yet a very recent working paper by Ricardo Mora and Iliana Reggio (two co-authors of this post) points out that DID-as-commonly-practiced implicitly involves other assumptions instead of Parallel Paths, assumptions perhaps unknown to the researcher, which may influence the estimate of the treatment effect. These assumptions concern the dynamics of the outcome of interest, both before and after the introduction of treatment, and the implications of the particular dynamic specification for the Parallel Paths assumption.
As stated, researchers often supplement the DID specification with a time trend of some parametric form such as a (perhaps group specific) linear trend. But by including this linear trend, the identifying assumption shifts from the standard Parallel Paths to what can be termed Parallel Growths, since now deviation from a trend line identifies impact (alternatively, we can think of Parallel Growths as a Parallel Path assumption in first differences).
The switch from Parallel Paths to Parallel Growths highlights a line of reasoning that Ricardo and Iliana formally extend to a general family of Parallel Assumptions valid for higher order differencing such as a difference of double-differencing (what might be called a Parallel Accelerations assumption) and so on. Arguably higher order Parallel Assumptions present weaker identifying assumptions than Parallel Paths – we no longer need the trend in the comparison group to proxy for the counterfactual trend of the treatment group but rather the growth (i.e. the change in trend) in the comparison group to proxy for the counterfactual growth. But there is a trade-off in our empirical practice since differencing of data tends to exacerbate any measurement error present in the outcome measures. So the extent that we can benefit from higher order Parallel Assumptions is determined by our data on a case by case basis.
Ricardo and Iliana then develop a general additive regression model with fully flexible dynamics – this has the advantage of being able to test for possible restrictions on the dynamics rather than simply positing a particular parametric form. The model also doesn’t impose equivalence between alternative parallel assumptions. In fact this model can test for such equivalence:
The framework above allows for fully flexible pre-treatment trend differentials between the treated and comparison group and also allows for a comparison of any two consecutive parallel assumptions such as Paths vs. Growths. Here Y is the outcome of interest and time runs from t1 until T with the intervention beginning at some point between t2 and T. The binary indicator variable I designates time-periods while D indicates treated units. In practice, researchers often estimate a more restrictive equation than this one – even when the data permit this more flexible model. Here is one paper that does use this specification to look at the effects of school-desegregation in the U.S.
Ricardo and Iliana then review all DiD papers published in ten well-known economic journals over the past three years and focus on those that (a) adopt a DiD framework with more than one pre-treatment time period and (b) have made the data publically available. There are nine papers that meet these criteria. The topics of study in these papers range from the effect of Daylight Savings Time on US residential electricity use to the effects of WWI related male mortality on marriage market outcomes in France. All of the nine papers adopt more restrictive estimating equations than the one above. In fact most of the 13 specifications in the nine papers restrict pre-treatment dynamics to be equivalent between treatment and comparison groups. Most also impose a constant treatment effect in post-treatment periods thus ignoring the possible dynamics of treatment.
Eleven of the 13 specifications report significant treatment effects in the original papers. In contrast by applying the flexible model to the data Ricardo and Iliana find:
- In the 11 cases that estimate significant impacts, once re-estimated with the fully flexible model and with an explicit Parallel Paths assumption, only 5 remain precisely estimated and many of the 11 have substantively different point estimates.
- With the Parallel Growths assumption this number falls to 3 of 11 cases.
- Tests for the constancy of post-treatment effects for 11 of the specifications wind up rejecting the absence of dynamic effects in 6 of the instances. It seems post-treatment dynamic effects often matter and ideally should be modeled in a more flexible manner.
- A test of the equivalence of Parallel Paths and Parallel Growth assumptions rejects equivalence in 5 out of the 13 specifications. In these cases the arguably weaker assumption of Parallel Growth results in significantly different findings than Parallel Paths.
Now it’s true that standard errors are higher in general with the fully-flexible model (especially with the Parallel Growths assumption tested with first-differenced data) and in many cases equality between the treatment effect reported in the published paper and the estimate under the flexible model cannot be rejected. As Ricardo and Iliana conclude, “with the fully flexible model we obtain results that coincide in sign and significance level with the original results in approximately one third of the cases. We interpret this outcome as suggesting that for many empirical applications, the models used are unduly restrictive.”
Here is a call to think twice about our DiD specifications. Data permitting, the more flexible proposed model above can serve as a benchmark at the start of any DiD analysis to test the robustness of alternative Parallel Assumptions and alternative dynamic specifications. At the very least this exercise may serve to guide more informed parsimonious models.
p.s. – Ricardo and Iliana are currently writing an ado file that would implement many of these tests on parallel assumption equivalence or dynamics. We’ll post a link when it is ready for sharing.