In a New York Times column  last Friday David Brooks discussed a book by Jim Manzi, and extolled the idea of randomized field trials as a way for the US to make better policies.
While it’s nice to welcome Citizen Brooks into the fold, there are a couple of points in his article worth exploring a bit.
First, he opens with some initial shock that no one has evaluated the stimulus and decries the conflicting results of models: “The problem is that no model can capture enough of the world’s complexity to yield definitive conclusions or make non-obvious predictions. A lot depends on what assumptions you build into them.”
Indeed, but as posts like Martin’s  have made clear, we need a range of tools at our disposal, because evaluation isn’t going to be right (or done) for everything. And discussions about the assumptions, parameters, and structures that underlie these models are a lot more informative than statements such as “this policy is bad for the economy.” By focusing discussion on what is changing and why this might be the case, model based simulations have a lot to teach us. In the end Brooks does concede that the stimulus couldn’t have been (impact) evaluated. But having policy be informed by careful research is the point. Let me come back to that in a minute.
Second Brooks criticizes the US for doing a “smattering” of evaluation. Surely the US could do more. But this doesn’t mean nothing gets done. Take for example the Department of Education’s Institute of Education Sciences  which not only directly supports impact evaluations for education policy in the US, but also puts out accessible reviews of evaluations done by others. And heck, it looks like they have plans to evaluate part of the stimulus spending. (And, as an added bonus, their website contains a range of interesting looking materials on methods).
So this raises the question of why we don’t see more evaluation in general. Brooks cites the political logic, and this is a point that has been nicely developed by Lant Pritchett in his paper  (gated) on why it pays to be ignorant. As Lant puts it:
Who really wants to know? While serendipity plays some role in knowledge, most increases in knowledge about the impact of public programs or policies are the result of deliberate research. If a program can already generate sufficient support to be adequately funded then knowledge is a danger. No advocate would want to engage in research that potentially undermines support for his/her program. Endless, but less than compelling, controversy is preferred to knowing for sure the answer is ‘‘no.’’
But even when we do know something, the voice of evidence often gets lost. One example of this comes from an evaluation (done by folks outside of the government) on the Oregon health insurance experiment which David blogged  about last year. The New York Times even wrote an editorial  about the findings at the time, but funnily, Brooks doesn’t mention it nor do you hear it oft mentioned in the current discussion about “Obamacare.” So there seems to be a gap between the evidence (even when it exists) and the dialogue.
Perhaps one way to improve both the lack of evidence and the lack of discussion about rigorous results is (as Brooks and Manzi suggest) to have a government agency tasked with policy evaluation. A lot of the experience of countries trying to do this is actually documented at the World Bank’s Evaluation Capacity Development website . The site has a discussion  of the US experience by Katharine Mark and John Pfeiffer. While their paper does make it clear that there is still a coordination issue in the US, it provides some hope:
The 2011 and 2012 budgets noted that President has made it very clear that policy decisions should be driven by evidence—evidence about what works and what does not and evidence that identifies the greatest needs and challenges. As a result, they described how the administration is using a new evaluation-based funding approach for several grant-based initiatives (U.S. OMB 2010, 2011, pp. 84-85). Under this three-tiered approach more money is first proposed for programs and practices that generate results backed up by strong evidence; for a second group of programs with some supportive evidence, funding is proposed on condition that the programs will be rigorously evaluated. If the programs are unsuccessful, their funds will be directed elsewhere. Finally, the approach encourages agencies to innovate and to test ideas with strong potential as indicated by preliminary research findings or reasonable hypotheses. The administration assumes that as more programs come forward with strong supporting evidence, they will be motivated toward further performance improvement. The administration also intends to work with agencies to make information on program impact evaluations available on line.
This isn’t enough of course, but it might be a start. And should the US decide to move to a more coordinated approach, the experience of the US’s neighbors to the south is instructive – Mexico, Chile and Colombia all have such agencies. While these agencies support impact evaluation, they also use other tools to do evaluative research (as well as broader monitoring) that informs the policy debate. And these agencies and their respective governments have had to deal with a range of political stumbling blocks along the way. For example, Manuel Fernando Castro’s discussion  of Colombia’s experience touches upon political economy issues that gibe nicely with Lant’s paper and offers some insight into how they might be dealt with. So clearly there is some knowledge out there. But will we learn?
My thanks to all of my colleagues who work on improving national M&E systems for their help in putting this together. Of course, they bear no responsibility for my mistakes (conceptual and factual).