With the increasing use of randomized and natural experiments in economics to identify causal program effects, it can sometimes be easy for the layperson to be easily confused about the population for which a parameter is being estimated. Just this morning, giving a presentation to a non-technical crowd, I could not help but go over the distinction between the average treatment effect (ATE) and the local average treatment effect (LATE). The questions these two estimands address are related yet quite different, in a way that matters not only to academics but equally to policymakers.

In a nutshell, the local average treatment effect, which is obtained via instrumental variables estimation, provides the effect of a program for people who were exogenously induced to comply with a treatment. For example, in this paper with David McKenzie, where we encouraged potential readers to check out Development Impact to assess the effects of reading our blog, this would be the effect of reading our blog for those marginal readers who read it only due to the encouragement and would not have read it otherwise.

But, as we discuss in the paper, the effect of a program on those marginal readers can be quite different than the average ones if, for example, the marginal readers are less interested in the topics covered in the blog or read blogs less intensively. If you’re interested in the effect on the average reader, however, you’ll have to revert back to non-experimental methods. For example, in our paper, we used a matching estimator.

A new working paper by Aronow ands Carnegie makes and interesting observation and suggests a new method to recover the ATE (HT to a tweet that linked to this blog post by Marc F. Bellemare). The observation the authors, who are PhD candidates in Political Science, make is that of the 34 empirical articles they reviewed that used IV estimation in the two top field journals, only two of them mentioned that the causal effect being estimated is the LATE. That seems unacceptable in any field and the only reason I can think of (or hope for) is that the authors themselves are clear about the estimand in question and are equally sure that their largely academic audience is as well. But, that assumption is much more suspect when it comes to policymakers or other nonacademic audiences. It’s the duty of the researcher to be as clear as possible up front about the question that is being answered, the population for which it is being answered, and all the relevant caveats that might affect sensitivity of the findings.

Aronow and Carnegie go on to propose an inverse compliance score weighting scheme to recover the ATE, which will be familiar to those who use inverse probability weights to derive population estimates from samples that are not simple random samples. They develop a maximum likelihood estimator that used the inverse probabilities of complying with the treatment in the entire sample to reweight units and arrive at the average treatment effect. Their estimator works even in the presence of two-sided non-compliance (i.e. they assume, with the rest of the literature, no defiers, but can deal with always-takers and never-takers).

After reading the paper quickly, I am not sure how different this estimator is than a matching estimator. In each case, pretreatment indicators are being used to calculate a probability of being treated. Matching estimators than find matches for each treated unit, i.e. one or more counterfactual control units. Here, each unit gets reweightedby the inverse of its propensity (or compliance) score. Perhaps they have to make slightly weaker assumptions than those needed by matching methods, but otherwise I could not clearly see the advantage of using their new estimator compared to existing ones to recover the ATE. Perhaps the authors (and/or others) would be kind enough to comment here.

## Comments

## Author response

## Thanks for flagging this in