I received the following question recently from a PhD student who is one of our readers “I have a question related to the DiD parallel trends assumption. I have seen some posts related to this but was wondering if there is any insight with regards to testing the parallel trends assumption when you only have one period before treatment.”
I thought this might be a problem others also face, so thought I share some thoughts on answering this.
First, I’ll note that there are at least a couple of issues with saying you are “testing the parallel trends assumption” in any DiD work. First, the assumption is one about what would happen in the future, had treatment not occurred, which is not something we can ever test. Instead, we can test whether the treated and control units had parallel pre-trends, and then claim that if they did, it is more credible that they might also have followed parallel trends for more periods had treatment not occurred. Second, as Jon Roth’s work has shown (blogged about here), tests of parallel pre-trends are often underpowered and failure to reject parallel pre-trends could mask bias from non-parallel trends, and reporting DiD effects conditional on passing a parallel pre-trends test introduces a pre-testing problem. I discuss here approaches that provide some robustness to violations of parallel trends.
But what if we don’t have multiple rounds of pre-treatment outcome data?
This seems a fairly common occurrence for prospective impact evaluations. For example, some policy is about to be implemented and you aim to quickly get a baseline survey in of units before some are treated and some are not. Unfortunately the policy is not being randomized, but you are hoping to use difference-in-differences. Then what can you do to make the parallel trends assumption more credible?
1. Use context-specific knowledge to argue why selection occurred the way it did for the program, and why parallel trends may be credible (discussed in this post).
2. If you are collecting baseline data, perhaps you can also ask for recall of multiple periods of the main outcome of interest, and then use this to check for pre-trends.
3. Use some form of matching combined with knowledge of how the treatment is being implemented to argue that you are comparing very similar individuals or units, and so the parallel trends assumption is likely to be more credible for them than if you were comparing individuals who were quite different from one another.
4. Examine post-treatment trends for a placebo treatment group/second control group: perhaps you can think of two equally plausible comparison groups for the treated units, and make an argument why ex ante you could use either as the control group. Perhaps seeing that these two groups continue to trend together after the treatment group is treated makes it a bit more plausible that life is going on as normal and no big unusual thing happened to the control group. A somewhat related example is in this synthetic control study I blogged about previously – they consider a tourism policy that affects the region of Salta in Argentina. One set of controls is firms in other industries in Salta – the argument being that Salta tourism would have followed the same trend as (a weighted average of) the rest of the industries in Salta. Another set of controls is tourism in other regions of the country, the argument being that Salta tourism would have followed the same trends as tourism elsewhere. If we then compare the two and get similar results, this suggests there were no big post-treatment shocks that differentially affected some regions or industries and that would invalidate parallel trends.
5. Look for a supplementary data set which may provide data on some proxies that you argue would trend in the same way as your outcome of interest, and then examine whether there were pre-trends in this proxy. For example, perhaps you have an intervention that aims to introduce high-speed broadband internet into some villages – and you only have baseline information on internet usage, but not prior period information. Maybe there are censuses, or other household surveys that have data on electricity coverage, mobile phone ownership and connections, and other infrastructure- and you can use this to show that treated and control villages have shown parallel trends for years in all other infrastructure and communication technologies being introduced.
As an example, John Gibson and I faced this issue when evaluating the introduction of a new seasonal worker program (published version, ungated version) that allows migrants from the Pacific to work in New Zealand. We launched an evaluation alongside the program launch, and had to quickly do baseline surveys in areas where employers were recruiting to get households before migrants left for New Zealand. We used approaches 1-3 above (and a little of 4), with no other data available to try approach 5:
1. We discuss the context of selection – noting that employers largely relied on village pre-screening and observable characteristics like English literacy to choose workers, and so could match on these types of characteristics. And we had a plausible reason why some households were treated and other households with similar characteristics were not – there was excess demand for this new employment opportunity and not all households who self-selected into wanting to participate were able to – and so we argue that those who looked similar were likely to provide a good counterfactual for the trends that treated households would have followed if not treated.
2. In our baseline survey, we did ask for recall of two periods of wage income, and test and show no difference in wage income growth between treated and control households in the year prior to the program.
3. We match households on a range of different baseline characteristics, and follow the suggestion of Crump et al. (2009) to restrict our sample to those with propensity scores in the range [0.1, 0.9], and then apply difference-in-differences to this propensity score pre-screened sample. This ensures we are only comparing households that were similar on many characteristics to begin with – for whom a parallel trends assumption seems more plausible.
4. We consider two different possible control groups, although one was a subset of the other: we consider all other similar households in the same villages, as well as the subset who applied for the program. We note that the main reason for not applying appears to be lack of information about the new program, rather than lack of demand. We get similar results using both groups – we could have also examined the trends in consumption, poverty etc for non-applicants versus applicants in our follow-up survey rounds to see if these two groups exhibited parallel trends.
This led us to write in the paper “After pre-screening with the propensity-score, both the Tongan and ni-Vanuatu samples are balanced on initial incomes, consumption, and poverty, which are our key outcomes of interest. The income generating processes in these countries were fairly stable over the period examined; households mainly semi-subsistence farming as they had been doing for years. Hence, assuming parallel trends in the absence of the RSE seems reasonable. Ideally one would have several rounds of pre-intervention outcome data to check this but the difficulty of recalling consumption and agricultural income from previous years makes this infeasible in our case, as it likely is in any similar evaluation. Wage income is more readily recalled, so as a further check, the bottom of Tables 1 and 2 shows no difference in the growth in wage income between RSE and non-RSE households over 2006-2007, a full year before the program began.”
Readers, please feel free to offer any other suggestions of approaches you can use to make the parallel trends assumption more credible when you do not have a lot of pre-treatment data.