Syndicate content

Evidence-based or interpretation-based?

Berk Ozler's picture

When people say “evidence-based policymaking” or they talk about the “credibility revolution, they are surely trying to talk about the fact that (a) we have (or trying hard to have) better evidence on impacts of various approaches to solve problems, and (b) we should use that evidence to make better decisions regarding policy and program design. However, the debate about the Haushofer and Shapiro (2018) paper on the three-year effects of GiveDirectly cash transfers in Kenya taught me that how people interpret the evidence is as important as the underlying evidence. The GiveDirectly blog (that I discussed here, and GiveDirectly posted an update here) and Justin Sandefur’s recent post on the CGD blog are two good examples.

Allow me to make a couple of caveats before moving forward. First, I am not equating Dr. Sandefur’s blog with GiveDirectly’s original one from February – I think the former is better. But, they both exemplify how you can interpret evidence to arrive at different readings of the same RCT. Second, I think that there is a lot to like in Sandefur’s blog. It’s hard to disagree with the main messages in the title, versions of which I have been saying myself. His summary of the related literature, the policy space, poverty traps, are, all, great. It is a more rounded post than mine, for sure, although I had no intention of interpreting ALL the evidence on cash transfers out there: just the findings (really the estimands) in the two – nine-month and three-year follow-ups – GiveDirectly evaluations.

There is one place, however, in Sandefur’s analysis that I would like to quibble with: his discussion of the plausibility of negative spillover effects. Sandefur introduced his blog post to the Twitterverse, by saying that he wanted to produce an accessible summary of the recent debate on the GiveDirectly RCT in Kenya. I suspect this means that, in the process, he needs to simplify a little and avoid using the “excruciating and persuasive” detail into which I went in my earlier post. But, there is a reason I write the way I do – it’s because those details, which may seem excruciating, matter. Without them, it’s easy to become confused. Sure, I lose a lot of people who stop reading because they are getting lost in terminology, but that is the tradeoff I make to avoid having people take away a message that is not supported by the evidence.

What am I talking about here? The most important issue to which Sandefur’s post could have devoted more space has to do with how he calculates the relative magnitudes of treatment vs. spillover effects in treated villages:
 

"In order to explain away the gap in consumption between cash recipients and their neighbors, the negative side-effects in Kenya would have to be enormous. While half the sample got treated, in the broader population, only 9 percent of the 100 households per village was treated. If negative spillovers are real, there’s little reason to think they’re limited to the research sample. Extending the results to the broader population would imply giving $400 to nine poor households in a village raised their consumption by just $17, while reducing the consumption of the other 91 households by about $30 each. That’s $2,530 in harm and just $153 in benefit per month."
 

Here, we need some technical detail. The sample in question contains only the eligible population in the study villages, i.e. those with thatch roofs that GiveDirectly targeted. Therefore any evidence on spillover effects presented in Haushofer and Shapiro (2018) comes from a sample of households who look exactly like those who were randomly chosen as beneficiaries. So, if GiveDirectly treated, on average, about 9% of the village population, then we also have some evidence on what happened to the other 9% who were randomized out in treatment villages (assuming a 50/50 randomization into treatment and spillover groups). We have NO data (repeat: NO DATA) on the remaining 82%, who were ineligible for the transfers and are, almost certainly, different than the eligible population in observable and unobservable characteristics.

What Sandefur does is to assume that the spillovers to this 9% would extend to the other 82%. He writes: If negative spillovers are real, there’s little reason to think they’re limited to the research sample.” With that, if an eligible non-beneficiary has a consumption deficit of $30 less than the pure control population, he assumes that so do the other 82%: facts not in evidence. In fact, not even facts. Just a strong assumption on a population we literally know nothing about (from the study at hand), and we have harm to the village that is an order of magnitude larger than the benefit…

Add to this some colorful language…
 

“Unless cash recipients literally spent the money on Gasoline to set fire to their neighbors’ farms, the scope of negative spillovers required to explain the Kenya results seems implausible.”

 
…and some more on Twitter, and now all you can remember is “wildly implausible.”
 
Look, I concede that a certain amount of poetic license can be used in blogs and Twitter (but, not in academic papers) to get a point across. But, we’re so far past the available evidence at this point that I don’t think this is helpful to the reader.
 
Setting aside dissemination methods, let’s discuss for a moment the substance of the claim that the spillover effects are too large to be believable. First, the spillover effect is not exactly $30, but rather somewhere between $8 and $52 as indicated by the rather wide confidence interval reported in HS (2018).
 
Second, why should these spillover effects be considered “wildly implausible?” For example, Bandiera et al. (2017), which, unlike Haushofer and Shapiro (2018), is equipped to describe spillovers and general equilibrium (GE) effects for the entire village (because they sampled from the ultra-, near-, and non-poor populations in each study village), finds substantive GE effects on agricultural and domestic worker’s wages. After all, that intervention transfers less in assets ($560 vs. $704) to a lower percentage of the population (ultra-poor comprise only 6% of the population in the Bangladesh study areas). If lower transfers to fewer people can cause village-level effects on wages, why can’t the GiveDirectly transfers have at least influenced the 9% of villagers who were randomized out? Muralidharan, Niehaus, and Sukhtankar (2017) report similar GE effects on low-skilled wages from improvements in the performance of the National Rural Employment Guarantee Scheme in Andhra Pradesh, India, which is not a particularly high-intensity intervention either.
 
[Note: A little more detail here on the GiveDirectly transfer amounts vs. the cost of “targeting the ultra-poor (TUP) programs. Sandefur refers to GiveDirectly’s transfers as being around $400, but actually the average transfer was $704, which is a weighted average of the small $404 transfers given to about three-quarters of the treated sample, and the large $1,520 given to the remaining beneficiaries. Second, he calls the TUP program, which has shown sustained effects over 4-7 years in Bangladesh, albeit small in absolute terms, expensive. Yes, it is more expensive because of the intensive training and support provided to beneficiaries for two years in addition to asset transfers, but not as much as you might think: The difference is $704 in transfers plus about $108 in admin costs for GiveDirectly vs. $1,120 for TUP.]
 
Now, ask yourself the following: “What is the one question people, who have been discussing the plausibility of negative spillovers effects, are not asking?” It is whether the +$47 impact implied by the within-village comparison ($235 - $188) is itself plausible, following a $704 UCT given three years ago. Let’s make some comparisons based on what we did learn from other studies:
  • The TUP program in Bangladesh mentioned above, which not only gave almost as much in assets and cash stipends as GiveDirectly, but also intensive skills training (by a livestock specialist) and other training (by BRAC officers), generated approximately $10-$13.5 PPP in consumption expenditures per household (using Table IV in Bandiera et al. 2017 and multiplying the US$ PPP impact per adult equivalent by three or four). A gain of $47 per month implied by within-village impacts in Kenya is 3.5 to 5 times higher than that in Bangladesh.
  • In Sri Lanka, de Mel, McKenzie, and Woodruff (2012) found increases in earnings of $8-$12 per month among urban microenterprises five years after they were given grants of $100 or $200, with the effects being limited to male-owned businesses only.
  • Roughly the same story in Blattman, Fiala, and Martinez (2014) four years after self-selected unemployed youth were given conditional group grants of about $382 per person to start businesses in skilled trades: earnings gains of about $10/month (I cannot tell non-durable consumption, which is more comparable to the Kenya and Bangladesh studies, because it is reported as a standardized index).
These are some of the most successful interventions for which we have evidence and the monthly impacts at the household level come nowhere near $47 – in fact none of them even lies within the confidence interval of $47, which is approximately ± $20. If we’re genuinely curious about the plausibility of negative spillover effects, shouldn’t we be also asking if it is reasonable to think that UCTs in Kenya had a much larger impact than these programs?
 
I don’t like this game, which allows anyone who does not like a result for some reason to say, “Oh, I don’t believe it.” It’s better to stick to the evidence. This is certainly not to say that we cannot debate the plausibility of research findings and try to poke holes in the evidence – after all, that’s a big part of our job description as academics (to be skeptical and question vigorously). But, when the evidence survives that onslaught, I don’t get to say something like: “Unless GiveDirectly sprinkled all those households with magical gold dust, I don't buy these treatmen effects.” As Dr. Sandefur also concludes at the end of his post, we either have to show what’s wrong with a study or accept, begrudgingly perhaps, the findings. And, I’d personally prefer to resist the temptation to speculate about their plausibility whenever possible…


 
 

Add new comment