This is the first in our series of posts by PhD students on the job market this year.
In theory, rigorous estimates of impact can improve the efficiency of public spending by helping to funnel resources into programs that work. But despite the growth of evaluation in academia, government, and organizations, we know little about whether this happens in practice. Understanding the baseline relationship matters, not only for optimal public spending, but also for how we can design research for policy impact.
Case in Point: Conditional Cash Transfers in Latin America and the Caribbean
Conditional Cash Transfers (CCTs) are a common policy tool for poverty alleviation in low and middle-income countries. Particularly in Latin America and the Caribbean (LAC), CCTs are both widespread and heavily evaluated. Between 2000-2015, almost all countries in the region had at least five program evaluations on the impact of their CCTs on poverty-related outcomes (figure 1). These evaluations were often embedded in government, with over half being conducted in collaboration with policymakers.
Figure 1: N. Program Evaluations of CCTs, 2000-2015
For this reason, CCTs are often heralded as a success for evidence-based policy (e.g. Duflo and Banerjee, 2012). However, the relationship between evaluation findings, and actual spending has not yet been rigorously studied.
In my job market paper, I study the relationship between evaluation findings and changes in policy spending of CCTs in Latin America and the Caribbean, between 2000-2015. I build a unique dataset of 128 program evaluations of CCTs mapped to corresponding program spending. In addition to treatment effects, my dataset covers rich information on the characteristics of the evaluation, and relationships between study authors and policymakers.
So, what is the relationship between evaluations and policy spending? To study this, we need to be very systematic in defining patterns in the data that may be consistent with evidence-use. I explore three types of relationships between evaluations and spending:
1. Patterns of immediate evidence-use, by considering the relationship between individual evaluation outcomes and CCT spending;
2. Patterns of gradual evidence-use, by considering the relationship between cumulative evidence and CCT spending;
3. Patterns of selective evidence-use, by considering the same relationship among subsets of evaluations associated with higher policy relevance or lower political constraints.
What do I find? Across immediate and gradual patterns of evidence-use, I find a precise zero relationship between evaluation outcomes and spending on that program. The only exception is when evaluation results are timely and made available in periods with low political constraints.
Immediate evidence-use: Individual evaluations and policy spending
One approach to evidence-use could be to learn from each individual evaluation on your program. In this case, policymakers would be adjusting their spending to findings from each program evaluation.
To study this relationship in the data, I need a measure of what policymakers may take away from each individual evaluation. However, each study provides rich information on the impact of a CCT program. This includes reported treatment effects – such as the effect size, or the statistical significance, but also softer information – such as the tone or language used to describe the results. To study patterns of immediate evidence-use, I therefore consider the relationship between individual evaluations and policy spending, across various ways of summarizing findings from individual evaluations.
I find a robust and precise zero association between individual evaluation outcomes and subsequent program spending. The zero relationship holds regardless of the way I summarize findings from each individual evaluation: there is no association between spending and the maximum or mean of the magnitude; or the maximum or mean of the statistical significance of evaluation results. Using sentiment analysis on the abstract text, I also find that more positively framed evaluation results do not correspond with larger increases in spending.
The estimated relationship between evaluation outcomes and spending is statistically insignificant and small in magnitude. Compared with an evaluation outcome of zero, a positive and significant evaluation would be associated with a $1.65m increase in spending, which explains less than 1% of the average annual change in CCT spending.
How do these program evaluations compare with the existing literature? Program evaluations are not published in a vacuum, and instead, contribute to a rich body of evidence. Policymakers may therefore be responsive not to the absolute size of evaluation outcomes, but rather, how surprising they are, relative to the existing evidence base. However, I find that more surprising findings do not correspond with larger changes in program spending. This means that evaluation results that are more positive, relative to the existing evidence, do not correspond with larger increases in spending. And, evaluation results that are more negative, relative to the existing evidence, do not correspond with larger decreases in spending.
Gradual evidence-use: Cumulative evidence and policy spending
An alternative view of the world is that evidence accumulates slowly over time. Studies of external validity suggest that there may be limited scope for learning from individual studies (Allcott, 2015; Rosenzweig and Udry, 2019). If that is the case, evidence-based policy spending would happen slowly as knowledge accumulates.
I explore this possibility by looking at the relationship between the cumulative evidence and CCT spending in each country. Using a Bayesian hierarchical model, I estimate the aggregate impact of CCTs based on each country’s evidence base. Bayesian hierarchical models are increasingly used in economics (e.g. Meager, 2019). The model allows me to jointly estimate the average treatment effect and sources of heterogeneity across studies.
My findings suggest that program evaluations of CCTs are informative. In most countries, there is a considerable amount of pooling across studies, indicating a reasonable level of external validity. Nonetheless, stronger aggregate evidence on the impact of CCTs is not associated with higher CCT spending across countries.
While my primary focus is on the intensive margin of spending, it is also interesting to consider whether program evaluations affect the extensive margin — when CCTs begin and end. I find that the strength of the existing evidence on CCTs does not predict when a country starts a new program. Likewise, the fact that negative evaluation outcomes do not correspond with decreases in spending, suggests that evaluation outcomes are not predictive of when programs end.
Rather, my findings suggest that once programs are in place, spending on that program is unrelated to what the studies find. This suggests that either policymakers do not change their spending in response to evaluation outcomes, or there is a complex relationship that directly offsets any changes made. As policymakers in this setting are highly trained and often directly or indirectly involved in the evaluations, this result seems unlikely to be driven by a lack of policy awareness. Instead, it is suggestive of the presence of constraints.
Selective evidence-use: Do features of evidence matter?
Even if policymakers learn from the evidence, there may be no changes to policy spending if policymakers face constraints to evidence-use. In this case, we would expect higher responsiveness in spending to subsets of evidence that are better aligned with policy decisions and associated with lower constraints to evidence-use.
I consider differential responses to three dimensions of evidence characteristics, often associated with higher policy relevance:
· Credibility – studies that are more internally valid and give more plausible estimates of impact, e.g. RCTs or program evaluations published in top academic journals
· Generalizability – studies that are more externally valid to broader populations of interest
· Actionability – studies that are more timely and embedded in policymaker decisions
Figure 2: Relationship between mean treatment effect & CCT spending on the evaluated program
I find that credibility and generalizability are unrelated to spending. Rather, the only subset of evaluations that predicts spending is the actionability of evaluations. Program evaluations that are timely – i.e. available faster than the mean of 4 years after the effect year -- are significantly predictive of spending. This relationship is driven by evaluation results that can be attributed to the political party in power.
Summary:
My results show that there is a robust and precise zero association between evaluation findings and policy spending. The only exception is when evaluation results are timely and political constraints are low. This suggests that the timeliness of evaluation may be an overlooked channel to increasing the impact of evidence.
More broadly, these findings matter because one of the main motivations of applied research is to influence policy. There is now a large volume of high-quality evaluations, and a growing number of institutions are dedicated to generating causal evidence. But without rigorously studying how evaluations affect policy, we are left without a clear path to designing optimal evidence for policy impact.
Michelle Rao is a PhD candidate at the London School of Economics
Join the Conversation