Consider a program that increases vaccination rates. It can reach 1000 people. You are allocating a budget. Holding all else constant, how much more would you allocate if the program could reach 100,000 people?
How folks respond to this is the subject of a fascinating new paper by Mattie Toma and Elizabeth Bell. They start with US government policymakers, targeting higher level folks who make these kind of decisions in real life with a lab-in-the-field experiment. Policymakers are asked for the maximum they would allocate from their budgets to programs in their realm of expertise (respondents come from the Departments of Education, Justice, and Health and Human Services, the General Services Agency and USAID as well as a bunch of other agencies).
Toma and Bell randomly vary the program choices the policymakers are presented with along three dimensions: scope (how many people will be impacted), outcome (whether it’s an immediate outcome like clicking on something or a more downstream one like being vaccinated) and how long the effects will last. Each participant has to rate six programs with variation across these dimensions (more on this in a minute). Taken together, and rated against maximum budget allocations, we get a sense of how policymakers respond to variations in impact.
Bottom line: not much. The overall sensitivity (elasticity) to impact is .33 – so if program impacts increase by 100 percent, policymakers are willing to put up only 33 percent more. Interestingly, they are more responsive to change in persistence (.59) than scope (.24) and outcome (.23).
But Toma and Bell aren’t done. They try two experiments to change sensitivity. The first consists of side-by-side comparisons – so two options of the same program on the same screen but with the options varying as described above. The second helps with the math through an “impact calculator” which gives the annual cost per person impacted.
And these make a big difference. The side-by-side boosts sensitivity by .26 (79% relative to the control mean) and the impact calculator adds .20 (60%). Intriguingly, these work by lowering the assessment for lower impact programs. Also interesting: remember that participants are rating multiple projects; getting exposed to one of these decision aids doesn’t change subsequent decisions made (without an aid). So, it’s in the moment help but not learning about decision making that seems to be happening when policymakers are exposed to these aids.
Toma and Bell also delve into what is correlated with sensitivity. Familiarity with the program area is strongly correlated, which makes sense. Self reported experience with evaluation is not correlated with sensitivity. Respondents’ confidence that they gave the best possible assessment shows a somewhat small, but significant effect. As Toma and Bell put it “intuitively, this suggests that respondents are aware of the difficulty of mapping the information they are receiving onto program assessments.” Indeed, when they ask about certainty after the decision aids, they find that folks are markedly more likely to say they are more certain.
Toma and Bell then take this experiment to a random sample of folks (in the US) using an online platform. The “public” appears to be markedly less sensitive to impact than the policymakers, coming in at .21 (versus the .33 we saw for policymakers). The decision aids again increase sensitivity, with side-by-side boosting it by 0.25 and the impact calculator increasing it by .17. But Toma and Bell also give the public the two aids in combination – and this gets an increase of .39 – so these aids seem to be additive rather than substitutes.
When Toma and Bell delve into the correlates of sensitivity we see different factors matter (not least because they can ask the public different questions). Confidence doesn’t matter. Numeracy is strongly positively correlated. And identifying as politically conservative is significantly negatively correlated with sensitivity. Interestingly, the coefficient on leaning conservative is no longer significant when controlling for numeracy (and confidence). Finally, Toma and Bell ask folks: “by what factor do you think people’s assessments of the value of a program change when its impact increases by a factor of 10?” The median and modal answer is 10. So, this doesn’t appear to be about preferences.
Toma and Bell do a good job of talking about the limitations of their study. First, there is recruitment. The most common way policymakers were recruited was through being identified by evaluation officers or evidence folks in their agency (followed by folks known to the Office of Evaluation Sciences and those recommended by participants). Moreover, Toma and Bell note that many folks were told they were recruited because they were involved in evidence and evaluation communities. This points to the potential for experimenter demand effects that would give us higher levels of sensitivity.
Toma and Bell also point out the obvious – policymakers generally (we hope) make more considered decisions than those in the experiment. (In case you were wondering the speed of decisions isn’t correlated with sensitivity within the experiment). As such, this experiment is capturing more of a first impression. However, it’s also important to keep in mind that this experiment avoids other things that may matter a lot for decisions, like political considerations for the policymakers.
Taken together, these are interesting results. At the very least, they suggest those of us who produce evidence with the hopes that it will make a difference for policy should think carefully about how we frame and present those findings.
Join the Conversation