Over the summer I’ve been slowly working my way through the new book Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction by Guido Imbens and Don Rubin. It is an introduction in the sense that it is 600 pages and still doesn’t have room for differenceindifferences, regression discontinuity, synthetic controls, power calculations, dealing with attrition, dealing with multiple time periods, treatment spillovers, or many other topics in causal inference (they promise a volume 2). But not an introduction in that it is graduate level and I imagine would be very confusing if you had no previous exposure to causal inference. So I thought I’d share some thoughts on this book for our readers.
What it is/does
Andrew Gelman has some discussion on his blog and his reviewer comments on five sample chapters from the book. He has one description that is very apt, saying that “The book is conceptual, more of a “how does it work” than a “how to”.” It carefully works the reader through analysis of randomized experiments and matching models through the potential outcomes framework in all its gory detail. Two features I very much like are that it uses two types of examples to work through and illustrate a lot of the math and mechanics. First, it frequently goes through calculations by focusing on say 4 to 6 observations, so you can see clearly what is observed, what is assumed, and how the counterfactual is established in each case. Second, once it gets to the methods, each chapter starts with the discussion of a particular dataset from a mix of social and medical sciences (e.g. teacher incentive experiment, Cholesterol drug data, jobtraining data, Children’s television workshop experiment data, etc.). It then goes through the mechanics of a particular method, and finishes the chapter by applying that method to the dataset so you can see how it works in practice.
What did I learn/what did I like?
Imbens and Rubin are of course wellknown developers of a lot of the theoretical literature used widely on causal analysis, and clear masters of the subject matter. While this is not at all light reading, it is undoubtedly good for you, and I am sure most of our readers will find they understand the subject matter better after reading this, and find things that can help them in impact evaluation. Here are a few things that stood out for me:
 Multiple methods of inference for randomized experiments: the authors cover four different approaches to analyzing data from a randomized experiment – using Fisher exact pvalues, Neyman’s repeated sampling approach, the standard regression approach, and a modelbased Bayesian approach. The Bayesian approach (chapter 8) was the most novel to me, and one I’ve thought might be useful in a number of impact evaluations but have never seen done in practice. In Section 8.10 they illustrate this approach with an example (the familiar Lalonde jobtraining data) using a diffuse prior, showing how this results in a posterior mean for the treatment effect similar to what one would derive using the other approaches. The potential attraction for me in such an approach would be in using a nondiffuse prior: e.g. imagine being asked to evaluate a program on which we have a variety of existing evidence from similar countries, but in which the sample size is a little small in the country you want to evaluate the program in. Then if you use the existing evidence to inform a nondiffuse prior distribution, you can ask how much the evidence from the new program causes you to change this prior, which may be informative even if there is too little power to get a precise treatment effect using a standard approach.
 Chapter 13 on how to choose which variables to include in a propensity score: the authors provide a stepwise procedure for selecting the covariates, and particularly, the higherorder terms to include: i) select some basic covariates on theoretical/substantive grounds; ii) select additional linear terms based on likelihood ratio tests of whether it is useful to add the additional variable, and if so, which one has the highest LR statistic; and then iii)following the same likelihood ratio based procedure to determine which quadratic and interaction terms to include.
 Chapter 14 on how to assess balance between treatment and control using normalized differences instead of the familiar tstats: they suggest that this can be used as a guide as to when simple covariate adjustment mechanisms like regression are likely to be reliable and when they are not – and have nice illustrations which show for different datasets how this plays out in practice.

Their preferred specification for regressionbased estimation with covariate adjustment (p. 247):
Y(i) = a + b*Treat(i) + (X(i)Xbar)c + Treat(i)*(X(i)Xbar)d + e(i)
i.e. including covariates as deviations from the sample average, and then also including the interaction of this deviation with treatment, with b then giving the average effect of the treatment. This differs from the more usual approach of just controlling linearly for X.
6) The limits on increases in precision in a randomized experiment due to adding covariates (Section 7.8) and a note that in small samples there can be a loss in precision from adding too many covariates.
What would I like to see?
Part of my wishlist is similar to my wishlist in my review of the Gerber and Green book on field experiments. Gerber and Green’s book is more practical (and Glennerster and Takavarasha's book reviewed by Markus even more so). My number one wishlist for both of them is to provide Stata code and datasets to let people work their way through implementing these in practice. This would be particularly useful for the less common approaches used in Imbens and Rubin – code for how to do the randomization inference, how to do the Bayesian estimation, etc. would be very welcome. Then of course all the other topics mentioned above, and a bit more on howto – so perhaps once volumes 2, 3, and 4 are written I will be satisfied?
Join the Conversation