I recently had a chance to read Rachel Glennester and Kudzai Takavarasha's (hereafter G&T) new book, Running Randomized Evaluations. It's got a lot to offer a bunch of different people.
First of all, who's it for? They seem to have written it for both policymakers and impact evaluation practitioners. And they've succeeded. The language is really readable and the concepts are clearly explained, including some fairly subtle points (to the point where I had writer's envy). OK, they do have a couple of equations (who can precisely explain power calculations without them?) but they complement these with intuitive explanations of what is going on. And they complement the book with a living website  with lectures, papers, and exercises.
The book runs the gamut from motivation through methods and onto analysis and what kind of policy lessons you can draw (the latter including a list of common pitfalls likely to be useful for the skeptical policymaker and journal referee alike). It's rich throughout with examples, some hypothetical but most drawn from actual evaluations. Given the J-PAL history of the authors (and the title), the focus is on randomization, but they do run through alternate methods and give them a fairly balanced treatment (e.g. "randomized evaluations are not the only valid way to conduct impact evaluations") as methods -but don't have discussions on how to do them technically and exploring implementation (that's for another book I would like to see). Summarizing the entire book would do it a disservice so what I'll do here is focus on some of the neat things I got from it.
Let me start with how they discuss impact evaluation methods. When they lay out the different methods, they do a nice survey of the assumptions, required data, and the comparison group for each of the methods and this provides either a nice exposition for understanding what is going on, or for teaching this material (if you want more equation oriented treatments of this, I often turn to Martin Ravallion's short paper  or longer handbook chapter ).
Another interesting thing that comes through in a number of the chapters is the evaluator as inside, really inside the program, rather than outside or separate. Early on, G&T go to some lengths to discuss how evaluators can work with program designers and implementers to think hard about to make the program better. This includes working through the theory of change (aka the logframe, results framework) not only to get a handle on what the program is doing, but (later in the book) to improve measurement and power. It also includes piloting the intervention and loose qualitative work (e.g. to understand needs of potential beneficiaries, potential mechanisms by which the program has impacts, and the like). They also stress at the beginning (and in a number of other places) the importance of monitoring what is going on in the program. Among other things, this engagement and this focus on a significant amount of startup work will help evaluators avoid evaluating interventions for which there might not be real demand.
G&T also provide some material that is very useful for frequently asked questions in any evaluation training. Some of the ones which grabbed me were: 1) why random assignment is not random sampling (you think it's obvious, but sometimes it is not) and 2) a nice clear exposition of what questions you need an impact evaluation for and what questions you need other tools (e.g. process evaluations, needs assessments) for which helps put impact evaluation in context.
The discussion on ethics is also quite useful. Not only does it contain a summary of the Belmont report that is better than any course I have taken, but they also tackle potential cross-cultural differences in ethics. And they're particularly good at weaving ethics into other discussions throughout the book.
And the lessons go to the perhaps-not-so-obvious mundane. Take the discussion on how to actually implement randomization (easy in theory, not so much in practice). They work you through the various steps. One example is the discussion of the “randomization device”. Sure you can use STATA, but if you are going for a public lottery they have concrete advice: "Avoid putting a large number of objects in one big sack because they may not shake and mix as well. Avoid cards that may stick together; use balls instead. Make sure that whatever you use is smooth and that someone putting a hand in the container cannot by touch know what number he is picking." This brought me back to a very enjoyable afternoon with the statisticians running the Burkina Faso national lottery as they schooled me in the various ways to rapidly randomize a large number of people.
Another point that resonates (especially for first timers) comes under "reality check of units of randomization." Villages may not be discrete, classes may merge, people might actually not live in the village. What looks like a good unit to randomize on paper may not make sense out in the field.
And something less obvious to a lot of us: the monotonicity assumption in encouragement or incentives to participate in a program. That is, we are often assuming that however we are trying to bring more people into a program affects all of them in a positive or zero way. We don't often think of the possibility that it may actually discourage some folks from participating -- and this can mess up our results.
G&T also provide a lot of insight into how to do better measurement. This isn't meant to be a how to do a survey guide (there are entire books on that), but they cover a range of topics that you won't find in books dedicated only to surveys. For example, there is a careful discussion of how to pick the best respondent for the variable you are trying to measure. And they give a neat catalog of non-survey instruments ranging from the "ride along" enumerator to biomarkers and games.
The power calculation discussion is similarly useful. After getting the basics out of the way, they tackle issues specific to evaluations. This discussion moves from clustering to more tricky topics such as cases where you might not want the treatment and control samples to be equal in size.
G&T also have a nice discussion of pre-analysis plans, including a thoughtful discussion of the timing of lodging (or updating) the pre-analysis plan to deal with new found effects (something I touched on in my post  last month) while balancing the dangers of data-mining.
For the evaluation practitioner, I think the one missing part of the book is more discussion and examples of what can go wrong. They actually get at this indirectly (e.g. the pitfalls discussed in the policy chapter, being really precise about how to actually randomly select people) but the gory details of what went wrong in evaluations aren’t there. But then, that's probably another book -- at least it is in my case.
Finally, for those awkward cocktails at impact evaluation trainings or even for a gathering of impact evaluation nerds, the book also has a smattering of impact evaluation trivia. Date of the first use of control and treatment group? 1747. How did the Hawthorne effect get its name? Why was John Henry a bad example of a counterfactual? For those (and more!), read the book.