The primary goal of an impact evaluation study is to estimate the causal effect of a program, policy, or intervention. Randomized assignment of treatment enables the researcher to draw causal inference in a relatively assumption free manner. If randomization is not feasible there are more assumption driven methods, termed quasi-experimental, such as regression discontinuity or propensity score matching. For many of our readers this summary is nothing new. But fortunately in our “community of practice” new statistical tools are developed at a rapid rate. Here is another exciting tool, a methodological extension of matching, termed:
The covariate balanced propensity score
Now a matching estimator is considered by many to be the least preferred quasi-experimental IE method because of the strong identifying assumptions that this method requires, especially in settings where participants have a choice to participate. I share this view. However the machinery that facilitates matching – the propensity score – is elegant and can be useful in a variety of settings. The propensity score makes matching a practical exercise as it reduces the likely insurmountable problem of matching on many dimensions to a straightforward match on only one dimension (first demonstrated in the seminal 1983 Rosenbaum and Rubin paper). Many neat applications of the propensity score have been worked out, included the propensity weighted regression estimate by Hirano, Imbens, and Ridder that yields an estimate of the average treatment effect.
So propensity score estimates are used widely. There is a catch though – the propensity score must be estimated. And there is no theoretical guidance over how best to do this. Practitioners usually estimate a logit or probit to predict treatment assignation and then check the covariate balance given by the resulting propensity score. If the researcher isn’t satisfied with the balance, then she will likely re-estimate with a somewhat different specification. The drawback with this approach, however, is that different specifications of the propensity score can result in very different estimates of the treatment effect (for one example, the 2005 paper by Jeffrey Smith and Petra Todd revisits a seminal labor training experiment first discussed by Robert Lalonde in 1986, and finds the matching estimate of impact highly sensitive to specification).
This back and forth specification search underscores the dual purpose of the propensity score: 1. It is meant to predict treatment assignation among the study subjects, i.e. it estimates the likelihood of treatment as a function of observable information. 2. It is meant to balance covariates so that two study subjects with the same propensity score are appreciably similar in observed dimensions. So in our everyday practice, we look for a specification that by design maximizes (1) and by hope satisfies (2).
A new paper by Kosuke Imai and Marc Ratkovic introduces some useful structure to the propensity score estimation by formally combining the dual purposes of the propensity score in one estimation framework and appropriately enough calls this new approach the covariate balancing propensity score (CBPS). With CBPS, a single estimate determines both the treatment assignment mechanism and the covariate balancing weights. (A side note: this is not the only method to automate covariate balancing but these other methods, such as Hainmueller (2012), do not explicitly link the covariate balancing weights with the propensity score).
The CBPS estimation details are described in the linked paper but, in brief, the authors stipulate the covariate balancing condition as well as the first order conditions from the propensity score likelihood function. This creates a system of equations that can be estimated jointly by generalized method of moments (GMM) since the number of moment conditions exceeds the number of parameters to be estimated. (It’s important to note that the balancing condition here is generalizable to higher moments of the covariate distribution, not only the first moment. Thus it can accommodate balance in the variances of the covariates as well as the means).
In standard propensity score matching, the empirical fit of the likelihood function is maximized so that it does the best possible job of predicting treatment status, but covariate balance is not explicitly addressed. In essence, the CBPS framework works by trading off some of this accuracy of prediction (the “likelihood”) to ensure a better balance of covariates.
So how does the CBPS perform in relation to standard matching vis-à-vis estimates of causal impact? Imai and Ratkovic work through two empirical examples where the CBPS does a substantially better job at minimizing bias and the root mean squared error (RMSE) – a summary of bias and variance – than the standard propensity score. For example in revisiting the Lalonde labor experiment, the CBPS estimate comes much closer to the experimental estimate of impact (an $886 gain in annual earnings from training) than the standard matching. With a 1 to N matching estimator, the standard propensity score understates the true (experimental) income gain by $805 while CBPS understates it by only $93.
CBPS can be extended to non-binary treatment outcomes, longitudinal data, and other common cases. And I am sure that further investigation of this method, and the conditions where it is most applicable, awaits. There is one caveat: this method assumes that no propensity score falls at either extreme of zero or one. The authors do not discuss the implications if this assumption is violated (and it speaks to the need to carefully apply this method, perhaps on a truncated data set) but, as practitioners know, propensity scores fall at either extreme at a disturbingly high frequency.
CBPS appears to be an interesting and promising new extension of familiar propensity score matching methods. If you review the paper and wish to implement CBPS, the authors have generously made a stats package for CBPS available in R.