Attend Spring Meetings on development topics from Apr 18-23. Comment and engage with experts. Calendar of Events

Syndicate content

The often (unspoken) assumptions behind the difference-in-difference estimator in practice

Jed Friedman's picture
This post is co-written with Ricardo Mora and Iliana Reggio
The difference-in-difference (DID) evaluation method should be very familiar to our readers – a method that infers program impact by comparing the pre- to post-intervention change in the outcome of interest for the treated group relative to a comparison group. The key assumption here is what is known as the “Parallel Paths” assumption, which posits that the average change in the comparison group represents the counterfactual change in the treatment group if there were no treatment. It is a popular method in part because the data requirements are not particularly onerous – it requires data from only two points in time – and the results are robust to any possible confounder as long as it doesn’t violate the Parallel Paths assumption. When data on several pre-treatment periods exist, researchers like to check the Parallel Paths assumption by testing for differences in the pre-treatment trends of the treatment and comparison groups. Equality of pre-treatment trends may lend confidence but this can’t directly test the identifying assumption; by construction that is untestable. Researchers also tend to explicitly model the “natural dynamics” of the outcome variable by including flexible time dummies for the control group and a parametric time trend differential between the control and the treated in the estimating specification.
Typically, the applied researcher’s practice of DID ends at this point. Yet a very recent working paper by Ricardo Mora and Iliana Reggio (two co-authors of this post) points out that DID-as-commonly-practiced implicitly involves other assumptions instead of  Parallel Paths, assumptions perhaps unknown to the researcher, which may influence the estimate of the treatment effect. These assumptions concern the dynamics of the outcome of interest, both before and after the introduction of treatment, and the implications of the particular dynamic specification for the Parallel Paths assumption.
As stated, researchers often supplement the DID specification with a time trend of some parametric form such as a (perhaps group specific) linear trend. But by including this linear trend, the identifying assumption shifts from the standard Parallel Paths to what can be termed Parallel Growths, since now deviation from a trend line identifies impact (alternatively, we can think of Parallel Growths as a Parallel Path assumption in first differences).
The switch from Parallel Paths to Parallel Growths highlights a line of reasoning that Ricardo and Iliana formally extend to a general family of Parallel Assumptions valid for higher order differencing such as a difference of double-differencing (what might be called a Parallel Accelerations assumption) and so on. Arguably higher order Parallel Assumptions present weaker identifying assumptions than Parallel Paths – we no longer need the trend in the comparison group to proxy for the counterfactual trend of the treatment group but rather the growth (i.e. the change in trend) in the comparison group to proxy for the counterfactual growth. But there is a trade-off in our empirical practice since differencing of data tends to exacerbate any measurement error present in the outcome measures. So the extent that we can benefit from higher order Parallel Assumptions is determined by our data on a case by case basis.
Ricardo and Iliana then develop a general additive regression model with fully flexible dynamics – this has the advantage of being able to test for possible restrictions on the dynamics rather than simply positing a particular parametric form. The model also doesn’t impose equivalence between alternative parallel assumptions. In fact this model can test for such equivalence:

The framework above allows for fully flexible pre-treatment trend differentials between the treated and comparison group and also allows for a comparison of any two consecutive parallel assumptions such as Paths vs. Growths. Here Y is the outcome of interest and time runs from t1 until T with the intervention beginning at some point between t2 and T. The binary indicator variable I designates time-periods while D indicates treated units. In practice, researchers often estimate a more restrictive equation than this one – even when the data permit this more flexible model. Here is one paper that does use this specification to look at the effects of school-desegregation in the U.S.
Ricardo and Iliana then review all DiD papers published in ten well-known economic journals over the past three years and focus on those that (a) adopt a DiD framework with more than one pre-treatment time period and (b) have made the data publically available. There are nine papers that meet these criteria. The topics of study in these papers range from the effect of Daylight Savings Time on US residential electricity use to the effects of WWI related male mortality on marriage market outcomes in France.  All of the nine papers adopt more restrictive estimating equations than the one above. In fact most of the 13 specifications in the nine papers restrict pre-treatment dynamics to be equivalent between treatment and comparison groups. Most also impose a constant treatment effect in post-treatment periods thus ignoring the possible dynamics of treatment.
Eleven of the 13 specifications report significant treatment effects in the original papers. In contrast by applying the flexible model to the data Ricardo and Iliana find:
  • In the 11 cases that estimate significant impacts, once re-estimated with the fully flexible model and with an explicit Parallel Paths assumption, only 5 remain precisely estimated and many of the 11 have substantively different point estimates.
  • With the Parallel Growths assumption this number falls to 3 of 11 cases.
  • Tests for the constancy of post-treatment effects for 11 of the specifications wind up rejecting the absence of dynamic effects in 6 of the instances. It seems post-treatment dynamic effects often matter and ideally should be modeled in a more flexible manner.
  • A test of the equivalence of Parallel Paths and Parallel Growth assumptions rejects equivalence in 5 out of the 13 specifications. In these cases the arguably weaker assumption of Parallel Growth results in significantly different findings than Parallel Paths.
Now it’s true that standard errors are higher in general with the fully-flexible model (especially with the Parallel Growths assumption tested with first-differenced data) and in many cases equality between the treatment effect reported in the published paper and the estimate under the flexible model cannot be rejected. As Ricardo and Iliana conclude, “with the fully flexible model we obtain results that coincide in sign and significance level with the original results in approximately one third of the cases. We interpret this outcome as suggesting that for many empirical applications, the models used are unduly restrictive.”
Here is a call to think twice about our DiD specifications. Data permitting, the more flexible proposed model above can serve as a benchmark at the start of any DiD analysis to test the robustness of alternative Parallel Assumptions and alternative dynamic specifications. At the very least this exercise may serve to guide more informed parsimonious models.
p.s. – Ricardo and Iliana are currently writing an ado file that would implement many of these tests on parallel assumption equivalence or dynamics. We’ll post a link when it is ready for sharing.


Submitted by Dexter on

Hi, is the parallel path and growth similar to Campbell and Stanley (1969) Non equivalent group design and Interrupted time series?

Submitted by Jed on

Hi Dexter, thanks for your question - I couldn't find the particular Campbell and Stanley paper you cite but in general, at least to my understanding, interrupted time series per se does not involve counterfactual analysis, but matched with non equivalent group design then we are in the standard diff-in-diff world with the Parallel Paths assumption. As far as I know, Ricardo and Iliana are the first to formalize the family of parallel assumption as the do. Ricardo also wanted to respond with the following:

Thanks Dexter for your interesting question. Interrupted Time Series
assumes the existence of time series before treatment. The longer the
better. You do not even need the existence of a control group or
It is true that ITS analysis may end up being equivalent to assuming
Parallel Paths, Parallel Growths, or both of them at the same time, but
it doesn't need to be so.
I personally view Parallel Paths assumptions as reasonable strategies
of identification when we only have a sequence of cross-sections,
treated and controls groups, and a small number of periods before and
after treatment (like less than 20 periods).

Submitted by Jed on

Hi Sergio, please try the link again, it seems to be working now…. Thanks!

Submitted by A. Salomon on

Randomized Treatments? It seems this discussion only applies to observational data. In the case of randomized treatment, paths and growths would be part of the covariates that are randomized over, and these assumptions should be reasonable. Correct?

Submitted by Jon on

That should be right. If randomization was successful, assumption is that all counterfactual covariates are equal on average, including paths and growths.

Submitted by Adijat Olubukola Olateju on

My name is Adijat Olubukola Olateju A PhD Student From University Sains Malaysia . Am currently writing my thesis on -The assessment of microfinance bank on Microenterprises in Nigeria. Please, I would like to know
1. If DID method can be used for a Cross Sectional data i.e. Data collection once on the observation
2. Can the questionnaire for the treatment and the control group be the same or Different.
Thank you

Submitted by Jon on

It doesn't sound like DiD is an option for your data. DiD requires a pre-treatment and a post-treatment period for all individuals/units of analysis. So a cross-section would lack the before/after comparison needed. Also, the data on the treated and control groups would need to be the same to allow valid comparison between the two groups.

Submitted by yahaya bala manga on

pls, am an Msc student of the dept. of ecomics usmanu danfodiyo university sokoto, nigeria.i want to know how one can ascertain the significance level of the result of the double difference estimators

Submitted by Jodi on

I'm deciding if a DID approach will be best for a current project of mine evaluating a policy change. I am still unclear about the "best" number of pre-post time periods needed. The seminal Card & Kruger 1994 paper used 2 time periods and has been widely criticized but I can't find a "best number" of time periods. Guidance or references please!? Thank you!

Submitted by Jon on

I think there's not so much a best as there is a minimum. You need 1 pre and 1 post at minimum, with greater number of periods (pre and post) always improving the model.

Submitted by Forhad on

I have more than three periods. For example, 1980, 1981, 1982, and 1983.
I have six groups: A, B, C, D, E, F. I want to check the effectiveness of minimum drinking age. For example, group A has 16 years in before and after the policy. Group B increases the age from 16 to 17. Group C increases the age from 16 to 18. Group D has changed the age at 17. Group E increases the age from 17 to 18. Group F has 18 years before and after the policy.

The data is panel data.
Is there anyone who can help me to design the DID model in such case?
How many treatment effects I need to find?

Thanks in advanced.

Submitted by Bram on

Hi Jed, This is very interesting. I have not had time to read the working paper yet, so excuse my comment if it is answered there.
I was wondering about the following. When applying a pure DiD to only two points in time, say a pre-test and post-test score, are you sure that often a parametric parallel trend is assumed? I would say that the only assumption needed, is additivity. In my view, things become tricky only after the effect is assumed to be multiplicative (A grew twice as much as B). However, when thinking of the true functional form, we can still express it correctly if we do not actually impose such assumption (A grew 3%, and B grew 6%). The question then is: what are we looking for?

Add new comment