Thank you for clarifying the estimand in your paper


This page in:

With the increasing use of randomized and natural experiments in economics to identify causal program effects, it can sometimes be easy for the layperson to be easily confused about the population for which a parameter is being estimated. Just this morning, giving a presentation to a non-technical crowd, I could not help but go over the distinction between the average treatment effect (ATE) and the local average treatment effect (LATE). The questions these two estimands address are related yet quite different, in a way that matters not only to academics but equally to policymakers.

In a nutshell, the local average treatment effect, which is obtained via instrumental variables estimation, provides the effect of a program for people who were exogenously induced to comply with a treatment. For example, in this paper with David McKenzie, where we encouraged potential readers to check out Development Impact to assess the effects of reading our blog, this would be the effect of reading our blog for those marginal readers who read it only due to the encouragement and would not have read it otherwise.

But, as we discuss in the paper, the effect of a program on those marginal readers can be quite different than the average ones if, for example, the marginal readers are less interested in the topics covered in the blog or read blogs less intensively. If you’re interested in the effect on the average reader, however, you’ll have to revert back to non-experimental methods. For example, in our paper, we used a matching estimator.

A new working paper by Aronow ands Carnegie makes and interesting observation and suggests a new method to recover the ATE (HT to a tweet that linked to this blog post by Marc F. Bellemare). The observation the authors, who are PhD candidates in Political Science, make is that of the 34 empirical articles they reviewed that used IV estimation in the two top field journals, only two of them mentioned that the causal effect being estimated is the LATE. That seems unacceptable in any field and the only reason I can think of (or hope for) is that the authors themselves are clear about the estimand in question and are equally sure that their largely academic audience is as well. But, that assumption is much more suspect when it comes to policymakers or other nonacademic audiences. It’s the duty of the researcher to be as clear as possible up front about the question that is being answered, the population for which it is being answered, and all the relevant caveats that might affect sensitivity of the findings.

Aronow and Carnegie go on to propose an inverse compliance score weighting scheme to recover the ATE, which will be familiar to those who use inverse probability weights to derive population estimates from samples that are not simple random samples. They develop a maximum likelihood estimator that used the inverse probabilities of complying with the treatment in the entire sample to reweight units and arrive at the average treatment effect. Their estimator works even in the presence of two-sided non-compliance (i.e. they assume, with the rest of the literature, no defiers, but can deal with always-takers and never-takers).

After reading the paper quickly, I am not sure how different this estimator is than a matching estimator. In each case, pretreatment indicators are being used to calculate a probability of being treated. Matching estimators than find matches for each treated unit, i.e. one or more counterfactual control units. Here, each unit gets reweightedby the inverse of its propensity (or compliance) score. Perhaps they have to make slightly weaker assumptions than those needed by matching methods, but otherwise I could not clearly see the advantage of using their new estimator compared to existing ones to recover the ATE. Perhaps the authors (and/or others) would be kind enough to comment here.



Berk Ozler

Lead Economist, Development Research Group, World Bank

Join the Conversation

Peter Aronow
September 13, 2012

Thank you for the helpful discussion of our paper. Allison and I are very glad that the paper is helping to draw attention to the distinction between the LATE and ATE.

We agree: like matching (or inverse-probability-of-treatment-weighting) estimators, the identifying assumptions here are, indeed, ultimately nonexperimental. But there is an important caveat here: the identifying assumptions for obtaining the ATE are much weaker with an instrument than without -- and the consequences of assumption failure are typically less dire.

In short, if a valid instrument exists, the researcher need only specify a set of covariates that predict all sources of systematic treatment effect heterogeneity to point identify the ATE. Without a valid instrument (e.g., in matching), the researcher needs covariates that predict _both_ selection bias and treatment effect heterogeneity. We believe that, in terms of producing plausible estimates of the ATE, the latter is a considerably higher bar to reach.

We suffer no illusions about our approach being a catch-all solution. Our paper proposes a modest approach: reweight the population so that compliers -- who are a latent group -- look more like the full population. Importantly, even if our reweighting assumptions fail, you'll still get a reweighted causal effect: one that's valid for a population that looks a whole lot more like the population that you care about. With matching estimators, no such guarantee exists: when the assumptions fail, you may not have estimated an average causal effect for _any_ population.

Using a standard matching estimator when a valid instrument exists is essentially throwing out a good deal of important information on selection bias -- see, e.g., Balke & Pearl 1997 or multiple papers by Manski for partial identification results. In fact, it is possible for matching estimators to asymptotically produce estimates of the ATE that are not logically possible, given the validity of the instrument. In contrast, our approach leverages the existence of an instrument to overcome the selection bias problem, and then places the focus squarely on treatment effect heterogeneity. Our approach is not the only way to implement the identification result, but we believe that it is a fairly sensible approach that clarifies the nature of the problem.

Finally, we'd like to contextualize the paper a bit: when we first posted this paper in 2010, there was little discussion of these points in political science, but the discussion has advanced since -- particularly as researchers increasingly recognize the power of a potential outcomes model for characterizing treatment effect heterogeneity. A growing body of work, either developed contemporaneously or since, has focused on the challenges of extrapolating causal effects to different target populations (e.g., Angrist & Fernandez-Val 2010, and recent work presented by Sekhon, Hartman & Grieve).

Again, we thank you for your thoughtful remarks and for advancing the discussion on the myriad problems faced in estimating average causal effects -- even with an instrumental variable. We welcome this discussion and hope that it provokes continued debate.


September 13, 2012

Thanks for flagging this in your blog, to add to the confusion, most of the studies identify ATT not ATE, ATE is mostly in practice a conceptual parameter. Practically most of the time it's not estimable, unless you have a randomised experiment and full compliance. Its usually not the case, some subjects will refuse and hence most studies in actuality estimate not ATE but ATT. That is on a subset who actually participate and not all that are eligible. Of course this is assuming there is no unobserved heterogeneity in returns, in which case what is being estimates as you say is LATE, but studies simply assume that there is no unobserved heterogeneity in returns.