Published on Development Impact

Book review of Cunningham’s Causal Inference: The Mixtape

David McKenzie

February 03, 2021

This page in:

The release of Scott Cunningham’s new book Causal Inference: the Mixtape was accompanied by the unusual sight of multiple economists proudly posting photos (e.g. 1, 2, 3) on twitter of the arrival of the book at their houses like they had just scored tickets to a sold-out concert. This book has two fantastic features for readers interested in impact evaluation. First, Scott has provided a free online version of the book, as a compliment to the paper copy. Second, the book provides Stata and R code throughout, which both shortens the distance from theory to practice, and can serve as a great way of helping learn one language if you know the other.

He has said that a key reason for writing the book is that “a readable introductory book with programming examples, data, and detailed exposition didn’t exist until this one” and that the book is “written first and foremost for practitioners…so they can use these tools to answer their own research questions”. The closest substitutes are probably Angrist and Pischke’s Mostly Harmless Econometrics, which is already a decade old, and does not include integrated code (although see the online archive) nor some more recent topics like synthetic controls, and Gertler et 2016’s Impact Evaluation in Practice which also is free online, which has some basic Stata code, and which has more on the design of impact evaluations, but is at a more basic level and with less coverage of recent methodological developments. It Iies between the two in terms of level of difficulty.

Scott has noted on twitter that his goal is for this to be the second textbook of choice for everyone – so it assumes some background in econometrics, and then focuses on doing applied analysis, bringing in several recent developments. It is intended an introductory book, so does not necessarily try to take the reader to the frontier of current applied practice, but rather to help them understand at least the basics of how different identification methods work.

The book is strongly focused on causal analysis, with very little to say about the design and implementation of causal studies. Thus, issues like power calculations, trade-offs between methods, data collection, etc. are not covered. It also largely assumes your data has been cleaned and outcomes defined for you, so there is no discussion of issues like dealing with attrition; whether to define outcomes in levels, logs, or as an IHS; or of how to aggregate outcomes into an index. It very much comes from a U.S. labor economics perspective. The focus is largely on non-experimental impact evaluation methods, with no separate chapter or detailed treatment of experimental methods.

Since he calls this a mixtape, here is my track by track guide, of chapters/tracks, in the spirit of an album review in which the critic assigns a letter grade to each track – and just like a music review, your tastes may vary – and I’m sure there is an econometrics critic joke about me being biased or mean there somewhere:

The Playlist

Track 1: Introduction (A-): A solid start to the album, offering some personal detail on how he got interested in causal inference, why he wrote the book, who the intended audience is, and how to use Stata and R to work through the book. There is a nice example of correlation is not causality which takes the opposite approach to the usual one of showing two ridiculous things are correlated by instead illustrating how the lack of correlation can hide a causal relationship because of endogenous responses.

Track 2: Probability and Regression Review (C): This track is like one of those Dave Matthews/Phish jams that goes on and on (80 pages), and is probably a tough entry point for someone interested in causal inference. The probability review does contain a nice discussion of the famous Monty Hall example using Bayes Rule, to illustrate how we can update beliefs when new information comes along. But the rest is a non-matrix algebra version of linear regression formulas, with some simulations thrown in. My preference would be to discuss linear regression in the context of trying to use it for causal analysis, and start with an example of when this could yield causal impacts, the assumptions required for this, and then illustrations (with simulations or real data) of the different ways this could fail (e.g. omitted variables, reverse causation, endogenous controls, incorrect functional form, etc.), as well as a more applied practical guide to inference issues - there is some mention of clustering of standard errors, but not a lot of guidance for how or when to use these, and no discussion of issues like multiple testing, examining heterogeneity in impacts, etc. that might be well addressed early on.

Track 3: Directed Acyclic Graphs (A-): This is his version of an Indie track, covering something not found in most econometrics textbooks. It provides a short introduction for economists to the use of directed graphs (DAGs), and how they can be both a useful communication device for making clear your assumptions of causal pathways, as well as a way for thinking through carefully issues of what is needed for identifying a causal effect. In particular, this is used to discuss two issues of importance for causal inference: should you control for a variable or not, and how might the sample you use implicitly involve conditioning on something you don’t want to. Of course standard econometrics courses talk about the perils of controlling for endogenous variables/”bad controls” and of sample selection, but the DAG framework can be useful in making this visually clear through the idea of collider bias. Two types of such bias are discussed through useful examples: whether or not you should control for occupation when attempting to test whether women are discriminated against in earnings; and what we can learn about racial bias in the use of force by police by using a sample of data that comes from police stops. One complaint is that just like The Book of Why, there is very little discussion of how to use DAGs for estimation or inference, and no discussion around treatment heterogeneity. This is one of the limits of DAGs as commonly used – there is no way to illustrate concepts like treatment effect heterogeneity, LATEs, or concerns like whether sequential ignorability holds.

Track 4: Potential Outcomes Causal Model (B): The type of track that won’t make the greatest hits album, but is still an ok listen. This is a solid introduction to the potential outcomes framework, and to defining concepts like the ATE, ATT and ATU. Rebecca Thornton’s job market paper on HIV testing is used as an example of a randomized experiment, and then a step-by-step introduction to randomization inference (RI) is included. Having the code to show both simulations and an illustration of RI is helpful here (although I would point readers to ritest as an even easier approach).

Track 5: Matching and subclassification (B-): The type of album filler track that reminds you of another artist, but doesn’t quite live up to the songs around it. It provides a pedagogical introduction to the idea of matching, and walks through different examples of exact matching, propensity score matching, inverse-probability weighting, and coarsened exact matching, using the classic Lalonde NSW job-training program. But it is much less useful as a guide to applied practice. It does not discuss what types of circumstances make matching more or less plausible (just do you have a DAG or not) – but for example, there is not discussion of the importance of being able to match on individuals in the same labor markets, or on multiple rounds of pre-program data to capture Ashenfelter dips, or on making sure treatment and control data are collected in the same way, or of what to do when you have large numbers of potential variables to match on, etc. For an applied person, it would be great to have recent examples where economists have found matching more plausible, as well as to help in understanding better the critiques many have of applied practice.

Track 6: Regression discontinuity (A-): A much stronger track to follow on the two previous ones. It walks through several cases of RDD, including close elections and medicare eligibility at age 65, and discusses a range of different methods, building up from the simplest cases, and going into discussion of density tests, placebo tests, and fuzzy RD. In contrast to the previous chapters, it talks for the first time a bit about the process of how you might go about finding an RD, and what you would need to do to get the data for it. It eventually works its way towards some of the key programmed tools in Cattaneo and co-authors RDpackages. There is no discussion of spatial RD designs and the different issues that may arise with them, nor again on power calculations and thinking through what size samples are needed for RD to work.

Track 7: Instrumental Variables (B+): A track that grows on you as it progresses. One useful point he makes is that the reduced form equation for a good instrument should seem weird – that is, you should be asking yourself why on earth the instrument would be related to the outcome, and then it is only when you learn the endogenous variable you understand – since otherwise it should be hard to think of why the instrument should affect the outcome. In addition to a rather standard coverage of IV, there is also discussion of two of the more popular and perhaps more credible IV designs where there have been recent (and continuing) advances (and which we have discussed on the blog): judge leniency IV designs and Bartik/shift-share instruments. What’s missing for an applied reader is more discussion and a framework on how to critique an IV – for example, he goes through the classic use of quarter of birth as an instrument for years of schooling when looking at impacts of schooling on earnings, where this would be an occasion to think talk through concerns of different types of parents being more likely to have kids at different types of the year. There is no discussion of overidentification tests and how they should be interpreted, nor of sensitivity analysis approaches that allow for some potential violations of the exclusion restriction.

Track 8: Panel Data (B): A Lil Nas X length track, this is a short and simple introduction to fixed effects estimation, with an application from Scott’s own work on US sex workers. I remember when I taught undergrad econometrics at Stanford using the similar Gertler et al (2005) paper on Mexico to teach fixed effects, and students finding it an interesting and less dry illustration than the usual textbook model.

Track 9: Difference-in-Differences (A): This is my pick for the chart-topping lead single of the album, and if I were to recommend just one chapter of the book to our readers, this would be it. It covers the standard difference-in-difference approach, moves onto triple differences, and focuses a lot on the usefulness of graphs for illustrating the results. There are some good examples given of placebo tests and a guide to graphically showing the results of event study estimation. The chapter is particularly strong in highlighting the issue of dealing with staggered treatments and how this affects interpretation of the DiD coefficients (something covered on the blog here) and giving a sense of the range of different approaches getting proposed to deal with these issues in the recent literature. While it discusses the parallel trends assumption, it does not cover the set of recent papers which have dug a lot more into assessing these assumptions and what to do when they don’t hold (which we covered in part 1, part 2). There is also not any discussion of how combining matching with DiD can be done to increase plausibility.

Track 10: Synthetic control (A-): The album finishes with a short track that provides an introduction to the synthetic control method, and illustration of how to do this in practice. This is a topic that has not received coverage in comparable existing books, and so will be a nice supplement for people relying primarily on other books. There is some discussion of recent work on the concerns over choosing which set of covariates and observations form the donor pool, but not much discussion of how to be a critical reader/reviewer/user of such work.

What’s not covered? The book is already 540 pages plus references, so complaining about what is not covered maybe just considered sequel fodder. If so, here are some key areas of causal inference that are not covered that are becoming increasingly important in applied work: a) the use of machine learning for causal inference – so there is no discussion of methods like post-double-selection lasso to choose controls, or machine learning methods to uncover treatment effect heterogeneity; b) moving beyond average treatment effects to estimate quantile treatment effects and other approaches to treatment heterogeneity; c) aggregation of causal evidence and meta-analysis; d) partial identification methods and bounding; and e) bunching estimators. So plenty of scope for at least some B-sides/bonus tracks in future editions.

I also realize that I am not the modal reader or target audience for this book, and so just like the grumpy rock critic who gets sent to a Post Malone concert (my benchmark to make me feel better whenever I get a negative referee report), I am likely missing what is valuable to other readers. So please share in the comments if you disagree with my relative rankings of chapters, or your thoughts/suggestions for practitioners in terms of what was surprising or new to you in reading this.

Get updates from Development Impact

Authors

David McKenzie

Lead Economist, Development Research Group, World Bank

More Blogs By David

Join the Conversation

The content of this field is kept private and will not be shown publicly

Remaining characters: 1000

I have read the Privacy Notice and consent to my personal data being processed, to the extent necessary, to submit my comment for moderation. I also consent to having my name published.