Syndicate content

Gerber and Green’s new textbook on Field Experiments – should you read it, and what should they add for version 2.0?

David McKenzie's picture

Alan Gerber and Don Green, political scientists at Yale and Columbia respectively, and authors of a large number of voting experiments, have a new textbook out titled Field Experiments: Design, Analysis, and Interpretation.  This is noteworthy because despite the massive growth in field experiments, to date there hasn’t been an accessible and modern textbook for social scientists looking to work in, or better understand, this area. The new book is very good, and I definitely recommend anyone working in this area to read at least key chapters. It discusses a number of things very well, gives lots of practical advice, but neglects a couple of important areas as well.

The book starts with discussion of the advantages of random assignment for causal inference, and how to estimate average treatment effects. It takes a randomization inference-based approach to obtaining confidence intervals and p-values, arguing that these methods are transparent, can be applied to any sample size and a whole range of outcomes, and can be easily used with both simple randomization as well as block (stratified) randomization and clustered randomization. A nice feature of the book is the number of step-by-step empirical examples to illustrate points clearly, usually motivated by real world studies. It then covers a range of different complications – using covariates, incomplete compliance, attrition, spillovers, and treatment heterogeneity; discusses issues in interpreting mediating mechanisms and updating priors based on multiple studies; before concluding with several very practical chapters – covering case studies of several experiments to illustrate design choices, a chapter on how to write a research proposal and journal article, an appendix on human subjects, and a set of suggested field experiments for class projects. Add exercises after each chapter and you have 492 pages of how to do experiments.

What did I like?

·         Chapter 7 on attrition is particularly good – it gives very clean and easy to understand examples of how to calculate Manski extreme bounds and Lee (trim) bounds; does a nice worked example and discussion of how to decide whether to spend a finite budget on trying to survey everyone in your follow-up survey but with somewhat high attrition versus doing a more intensive sample of a subset and reweighting; and gives a good illustration of the dangers of dropping strata with high attrition rates and deciding only to do analysis on strata with relatively low attrition.

·         Randomization inference so clearly explained

·         A number of technical points that I don’t commonly have to deal with explained in a nice manner. E.g.

o   Placebo designs to identify who compliers are: (p. 162)the example is a get out the vote campaign, where you give a message to people who can be found at home and answer the door. Problem is that power is low when few people agree to speak to you. They suggest one solution is to do a placebo treatment, where instead of the voting message, you follow the same protocols to get them to open the door and talk to you, but then give a recycling message – this identifies the people who would be compliers to the voting treatment, enabling them to then just do analysis on the complier subsample and have more power. Of course this is much easier to do in information-type treatments than in cases where compliance means attending a training session, or taking out a loan, or participating in other such treatments.

o   Clustered randomization with small samples: (p. 83) they note and illustrate how a small sample bias can arise in clustered experiments if the size of a cluster varies with potential outcomes and difference in means is used to estimate the treatment effect. Solutions are to block on cluster size, get more clusters, or use a different estimator.

o   Discussing the challenges of identifying mediation/mechanisms: In chapter 10 they explain why it is so hard to look inside the “black box”, and push for implicit mediation analysis – where you try a range of different treatments, each which tries to target different mechanisms – rather than trying to do encouragement designs on the mechanisms themselves. Berk won’t be happy though – despite referring to his QJE paper as an example of this, Green and Gerber actually give the result from the earlier working paper version, which as Berk has noted, is different from the published version [the referencing is a bit sloppy, Berk’s paper is referenced as appearing in the QJE in 2009, when it was published in 2011; I came across at least 5 papers referenced in the text that aren’t in the references at all].

·         A very good checklist of how to write-up your results in Chapter 13, along with advice I should pay more attention to “write as you go”.

What’s missing/What would I like to see in a version 2.0?

·         Power calculations: The book is surprisingly unhelpful on power calculations, relegating their discussion to a single page appendix of one chapter. The randomization toolkit of Duflo, Glennerster and Kremer is much better on this (and is strangely not referred to). Linking more explicitly how low take-up rates lead to low power is something I think is key.

·         T as a choice? Not surprisingly I would have liked to see discussion of how we don’t always have to just do a single follow-up survey, and of the trade-offs involved in deciding how many rounds of data to collect.

·         Randomization inference with incomplete compliance: the book focuses on randomization inference instead of regression for most of the discussion, but then when it comes to estimating impacts with incomplete compliance, doesn’t go through the Greevy et al. approach to estimate treatment effects in this case.

·         Stata code: the book comes with R code to implement a lot of the examples discussed in the book. R is great in that it is open access, and seems to be used a lot by some political scientists and statisticians. But uptake of some of the ideas of this book among economists is likely to be lower due to not having this in Stata, which seems to be the standard tool for development economists at least.

·         Quantile treatment effects? We never get the authors’ take on whether they think these are useful or not. See my blog post on this here.

·         More practical stuff on how to work with NGOs/Govts, etc. One of the key questions graduate students often have is how to get started working with a partner. Indeed one of our first posts featured advice by Dean Karlan and Dean Yang on this issue. I think discussion of practical things like planning several alternative evaluation designs in case one falls through, ways to piggyback other experiments and measurement exercises on a larger evaluation, etc. would be very useful for students. Again the toolkit of Duflo et al has a little on this, but a well-written chapter would be nice.

So there you go, my suggestions to add another 100 pages to this book! These points aside, I found the book very well written and informative, and it will certainly be something I will be glad to have as a reference. Definitely one of the clearest and most up to date expositions we have at the moment of many important issues.

Reminder: If you didn’t give your estimate yet of the effects of our impact evaluations in Jordan, please do so now, by reading the short description here and taking 2 minutes to give your estimate of what the impact would be.


Submitted by Trent on
R is the language of Social Sciences including the eCon-artists. Get on it ( Trent