My job market paper brings some good news to the impact evaluation community. First, it shows that causal inference in randomized controlled trials (RCTs) relies on weaker assumptions than was previously thought. Second, it shows that RCTs capture local treatment effects that are less local than we previously believed.
In most RCTs in economics, compliance to the initial random assignment is imperfect. Some subjects need the treatment so badly that they will always get treated, irrespective of whether they are assigned to the treatment or to the control group. Those guys are called “always takers”. Some subjects do not need the treatment at all and they will never get treated, even if they are assigned to the treatment group. Those guys are called “never takers”. Some subjects are nice guys who do what they are supposed to do: they will get treated if they are assigned to the treatment group, and they will not get treated if they are assigned to the control group. Finally, some subjects could be “rebel teenagers”, who do the opposite of what they are supposed to do: they will get treated if they are assigned to the control group, and they will not get treated if they are assigned to the treatment group. They are called defiers.
Imbens and Angrist (1994) have shown that it is still possible to conduct valid causal inference when compliance is imperfect, provided that there are no defiers in the population. In this case, the Wald ratio measures the average effect of the treatment for compliers, the so-called local average treatment effect (LATE). The “no-defiers” assumption is crucial to obtain this result; if it is violated, the Wald ratio might be negative while treatment effects are positive for everyone.
A first concern with this result is that sometimes, the “no-defiers” condition might be too strong. In RCTs relying on an encouragement design, the treatment group may typically receive some information via, say, a flyer. This flyer conveys some information about the treatment, which could decrease the benefits that some subjects expect from treatment, and may induce them to defy, i.e. not take up the treatment. There may also be defiers in some well-known natural experiments papers. Angrist and Krueger (1991) use date of birth as an instrument for years of schooling; children born late in the year enter school at a younger age and should therefore complete more years of schooling. Nonetheless, Barua and Lang (2010) argue that some parents delay school entrance of their child if they think (s)he is not mature enough to enter school by the time (s)he would be supposed to, which is called redshirting. As children born late in the year are more likely to be redshirted, some of them will complete fewer years of schooling than if they had been born in January. Children who are marginally redshirted, that is, redshirted only if they were born in December, are defiers. Never and always redshirted children are compliers.
A second concern is that the LATE captures the effect of the treatment only for compliers, while we would like to know the effect of the treatment for everyone. For instance, Angrist and Evans (1998) study the effect of childbearing on mothers labor supply. In their paper, compliers account for only 6% of the total population, while we would like to know the effect for everyone, or at least for a larger subgroup of the population of interest.
The first contribution of my paper is to show that the results in Imbens and Angrist (1994) hold under a weaker condition than “no-defiers”. This weaker condition requires that there be more compliers than defiers (MCTD) in each subgroup of the population with the same treatment effect. To prove this result, I start showing that if MCTD holds, then compliers can be partitioned into two subgroups, “comfiers” and “comvivors”. The first subgroup – “comfiers” – has the same size and the same treatment effects as defiers. The figure below presents how comfiers and comvivors are constructed. Assume that the potential outcomes under control and treatment, i.e. Y(0) and Y(1) respectively, are binary. The effect of the treatment can only take three values: -1, 0, and 1. In this example, MCTD holds, as it appears from the first table that in each subgroup with the same treatment effect there are more compliers than defiers. For example, in the subgroup with Y(1)-Y(0)=-1, there are twice as many compliers as defiers. To construct comfiers, I pick up one half of compliers in this subgroup which I call comfiers, and I call the remaining part comvivors. In the subgroup with Y(1)-Y(0)=1, there are three times more compliers than defiers. To construct comfiers, I pick up one third of compliers in this subgroup which I call comfiers, and I call the remaining part comvivors. The two populations of comfiers and comvivors that I finally obtain after doing this in every Y(1)-Y(0) subgroup are presented in the second table. Comfiers and defiers indeed have the same size and the same distribution of treatment effects.
Then, Angrist, Imbens, and Rubin (1996) have shown that when there are defiers, the Wald ratio is a weighted difference of the average effect of the treatment for compliers and defiers. This is because the instrument moves compliers from non-treatment to treatment, while it moves defiers from treatment to non-treatment. I plug my comfiers / comvivors decomposition into their formula, treatment effects among comfiers and defiers cancel one another out, and the Wald ratio finally captures treatment effects for comvivors. Substituting “no-defiers” with MCTD does not diminish the external validity of the Wald ratio. Indeed, comvivors account for the same percentage of the total population as the standard population of compliers.
“More compliers than defiers” is a plausible assumption in the two examples highlighted above. In the encouragement design, it holds provided that the flyer is sufficiently appealing to ensure that more subjects have a better than a worse opinion of the treatment after reading it. In Angrist and Krueger (1991), it seems plausible that there are more compliers than defiers in each subgroup of the population with the same potential outcomes. Defiers are marginally redshirted children, that is, low ability children. Compliers include both never redshirted children, i.e. normal and high ability children, and always redshirted children, i.e. very low ability children. Conditional on low wages, there are probably more always than marginally redshirted children. Conditional on average and high wages, there are probably more never redshirted children.
To simplify the exposition of the second contribution of my paper, let me assume now that we are back in a world without defiers, in which the Wald ratio measures the effect of the treatment for compliers. The second contribution of my paper is to show that the Wald ratio actually captures treatment effects for a larger group than compliers. Indeed, some always takers and some never takers have the same treatment effects as compliers. Therefore, IV captures treatment effects for a subgroup G, which contains all compliers, some always takers, and some never takers.
G is obtained by “multiplying” compliers as much as possible, as presented in the figure below. In this example, I assume that the two potential outcomes are binary. I also assume that the population is made up of 8 compliers, 15 never takers, and 9 always takers. Therefore, the size of compliers, P(C), is equal to 25%. Compliers account for 50% of the subgroup of subjects such that (Y(0)=0,Y(1)=0), as this subgroup bears 3 compliers, 2 never takers, and 1 always taker. Compliers account for a lower fraction of the three remaining (Y(0),Y(1)) subgroups. To construct G, I am going to multiply compliers by 2 in each (Y(0),Y(1)) subgroup. This means that in each (Y(0),Y(1)) subgroup, I will pick up all compliers, and I will also pick up as many always takers or never takers as compliers. This is feasible because compliers never represent more than 50% of a (Y(0),Y(1)) subgroup, For instance, I am going to add three never or always takers to the three compliers in the (Y(0)=0,Y(1)=0) subgroup. Since I am multiplying compliers by a constant factor of 2 across every (Y(0),Y(1)) subgroup, G is twice as large as compliers. Moreover, the relative size of each (Y(0),Y(1)) subgroup is the same in G as among compliers, which ensures that treatment effects are the same for G as for compliers. Finally, notice that the entire (Y(0)=0,Y(1)=0) subgroup is in G. This means that I could not multiply compliers by more than 2, as this would require picking up more (Y(0)=0,Y(1)=0) guys than is possible. Therefore, G is the largest population with same treatment effects as compliers.
P(G) is equal to the size of compliers divided by the largest share of compliers in a (Y(0),Y(1)) subgroup. Therefore, P(G) is not identified from the data, as the data does not reveal the joint distribution of the potential outcomes. Notwithstanding, P(G)=P(C), if and only if one (Y(0),Y(1)) subgroup contains only compliers. In many cases, this seems implausible. In Angrist and Evans (1998), compliers account for only 6% of the total population. Potential outcomes are binary, and define only four subgroups. It seems unlikely that one of these four subgroups bears compliers only.
Since P(G) is not point identified, I turn to a partial identification framework. I derive a lower bound for the size of G under a “strong instrument” assumption. This assumption requires that there be a vector of covariates X such that the highest percentage of compliers in a (Y(0),X) subgroup is larger than the highest percentage of compliers in a (Y(0),Y(1)) subgroup. If compliers and non compliers with the same Y(0) and the same X also have the same distribution of treatment effects, then the “strong instrument” assumption is satisfied. It is weaker than the assumptions previously used to extrapolate IV. For example, the conditional effect ignorability assumption in Angrist and Fernandez-Val (2010) requires that compliers and non compliers with the same X have the same distribution of treatment effects. Essentially, I am replacing the unobserved potential outcome by a rich set of covariates X, and doing this relies on a weaker assumption than those previously used in the literature.
I estimate this lower bound on Angrist and Evans (1998) data. I use covariates defining 30 subgroups, and inducing substantial variation in the percentage of compliers across those 30 subgroups. The “strong instrument” assumption means that the largest share of compliers in the four (Y(0),Y(1)) subgroups should be lower than the largest share of compliers in the 60 (Y(0),X) subgroups. Under this assumption, I find that the results apply to at least 20% of the population, instead of applying only to 6%.
Overall, this paper brings good news to the impact evaluation community. Causal inference in RCTs with imperfect compliance relies on weaker assumptions than was previously thought, and those RCTs capture less local treatment effects than we previously believed.
Clement de Chaisemartin is a Ph.D. student at the Paris School of Economics and at CREST.