Yesterday we parted on a cliffhanger. We showed that the validity of two way fixed effects (TWFE) to implement difference-in-differences is not obvious when generalized to multiple cohorts and multiple time periods and left it there. Today we explain why and showcase recent advances that offer alternative difference-in-differences estimators.
When is TWFE not okay? Forbidden contrasts
As an alternative to canonical difference-in-differences (discussed in the previous blog), in some cases the comparison group in difference-in-differences is treated in both periods, while the second group becomes treated in the second period.
To understand how this comparison can introduce bias, we build on the 3 cohort example from our previous blog, two of which become treated — Cohort 1 is never treated, Cohort 2 receives treatment in 2005, and Cohort 3 receives treatment in 2010. We plot an example of this setup below — in our example, there is no sharp change in outcomes after treatment, but instead treatment effects are dynamic and grow over time.
Recall that this setup offers three possible comparisons (Cohort 2 against 1, Cohort 3 against 1, and Cohort 3 against 2), and how two of these comparisons were canonical difference-in-differences with one treated cohort and one never treated cohort (Cohort 2 against 1 and Cohort 3 against 1).
In contrast, when comparing Cohorts 2 and 3, two difference-in-difference comparisons are possible. The first is canonical — it compares Cohort 2 to Cohort 3, from 2005-to-2010 to 2000-to-2005. In this comparison, Cohort 3 is control in both periods, while Cohort 2 becomes treated in the second period. The second, however, is non-standard — it compares Cohort 3 to Cohort 2, from 2010-to-2020 to 2005-to-2010. In this comparison, Cohort 2 is treated in both periods, and serves as the comparison group for Cohort 3, which becomes treated in the second period.
In the non-standard comparison, and as before, identification of causal effects would require a parallel trends assumption — absent Cohort 3 treatment, Cohort 2 and Cohort 3 would have had the same changes in outcomes. However, this assumption is much stronger when Cohort 2 is treated in both periods — it combines:
- A conventional parallel trends assumption: Absent treatment for BOTH Cohort 2 and Cohort 3, Cohort 2 and Cohort 3 would have had the same changes in outcomes
- And a non-standard assumption of constant Cohort 2 treatment effects: This rules out both dynamic treatment effects, where treatment effects change as time since treatment grows, and interactions between treatment effects and time, where treatment has different effects in different periods; see discussion in this recent blog post. For example, in the graph above, Cohort 2’s treatment effect grows over time — as a result, the non-standard comparison equals Cohort 3 ATT minus changes in Cohort 2’s treatment effect over time!
This comparison is aptly referred to as “forbidden” — the second assumption is stronger than the conventional parallel trends assumption and, in fact, rules out many dynamics that we should be interested in estimating!
Recent papers (de Chaisemartin & D'Haultfœuille, 2020; Goodman-Bacon, 2021, discussed on this blog) have noted that two way fixed effects with multiple cohorts and multiple periods averages over both canonical DID estimators (which recover ATT under parallel trends) and also forbidden DID estimators (which only recover ATT with an additional assumption of constant comparison cohort treatment effects). To see this, think about two way fixed effects in our staggered adoption design above — in general, two way fixed effects averages over the difference-in-difference estimators for each pair of cohorts and each pair of time periods — this includes canonical examples (e.g., Cohort 2 and Cohort 1, 2005-to-2020 and 2000-to-2005) and forbidden comparisons (e.g., Cohort 3 and Cohort 2, 2010-to-2020 and 2005-to-2010).
Another way to think about this issue with the two way fixed effects setup is to notice that, when differencing, because of staggered adoption, Cohort 3’s outcome in 2005-to-2010 is subtracted from its outcome in 2010-to-2020 when compared to Cohort 2. This subtraction of Cohort 3 outcomes in periods in which it is treated means that these observations might end up having negative weights. As these observations are treated, this can put a negative weight on Cohort 3’s ATT from 2005-to-2010 in \( \beta^{\text{TWFE}} \)! As a result, \( \beta^{\text{TWFE}} \) might even have a different sign than the signs of the ATT stemming from each pair of cohorts comparisons!
Note that this bias in two way fixed effects occurs even when treatment is random. Random assignment guarantees treatment effects some number of periods after treatment and in a given time period will be equal across groups. However, dynamic treatment effects and interactions between treatment effects and time still occur with random assignment.
One weird trick to avoid bias …
One way to purge our estimates of this additional bias is, errr-emmm, well to avoid forbidden contrasts! Recent papers suggest three solutions.
First, forbidden contrasts are avoided when there are no forbidden contrasts possible. This occurs when there are only one treated group and one never treated group, regardless of the number of time periods. In addition, when the second group is eventually treated, it is always possible to ensure a never treated group by dropping periods after the second group is treated. Note that this is also likely to “roughly” hold when all treated units are treated in almost the same period (as one may observe with high frequency data) even with multiple groups; new diagnostic approaches may indicate that two way fixed effects is mostly fine in this case! In some cases, randomization weakens the assumptions needed to learn something from periods after the second group is treated.
Second, forbidden contrasts are avoided when allowing sufficiently flexible heterogeneity in ATT across groups and over time. For example, if one allows ATT to differ for each group and time period, forbidden contrasts will never be drawn. Recent work (de Chaisemartin & D'Haultfœuille, 2020; Abraham & Sun, 2021; Callaway & Sant’Anna, 2021; Wooldridge, 2021) do exactly this, and propose different approaches to summarizing effects with weighted averages of ATT across group-time periods (e.g., ATT a fixed number of periods after treatment, ATT in a given period, average within group difference in ATT one period and two periods after treatment).
Third, forbidden contrasts are avoided by ensuring comparison groups are never treated during comparison periods. One approach to this is an imputation approach, which predicts counterfactual outcomes for treated observations using only control observations in a two way fixed effects model (Borusyak et al, 2021; Gardner, 2021). Alternatively, one can use a “stacked event study” (Cengiz et al., 2019). For one approach to this, for each treated group, construct an “experiment”, where only the treated group and untreated observations are included, and run two way fixed effects with experiment-by-individual and experiment-by-time fixed effects. Although this duplicates many observations, corrected inference by clustering standard errors at the individual (rather than experiment-by-individual) level is straightforward.
Join the Conversation