This is a guest post by Johannes Haushofer and Jeremy Shapiro
[Update: 11:00 AM on 4/23/2018. Upon the request of the guest bloggers, this post has been updated to include GiveDirectly's updated blog post, published on their website on 4/20/2018, within the text of their post rather than within Özler’s response that follows.]
We’re glad our paper evaluating the long-term impacts of cash transfers has been discussed by GiveDirectly (the source of the transfers) itself and Berk Özler at the World Bank, among others (GiveDirectly has since updated their take on our paper). Given the different perspectives put forth, we wanted to share a few clarifications and our view of the big picture implications.
First, some background: our study was a cluster level randomized trial. We had a treatment group (receiving cash transfers), a spillover group (neighbors of the treatment group), and a pure control group (living in entirely separate villages). In our study of the short-term impacts of cash transfers, the primary comparison is between the treatment and spillover group - we explain why below. In both the short and long-term studies we show all possible comparisons (treatment to spillover, treatment to pure control, and spillover to pure control) to the interested reader.
To clear up several points in Berk Özler’s discussion of our long-term study:
- The reason for a partial baseline
The second concern is valid: we did not identify individuals who would be part of the pure control group at baseline (only the villages). They were selected later using the same criterion. This creates the possibility that people fulfilling this criterion at different times might not be comparable.
2. Why the partial baseline does not strongly bias the results
In the short-term study, we show that this possible bias from applying the selection criterion late is not very important. We do this by finding out, based on new data collection, how many people we missed in the pure control villages because we applied the eligibility criterion at a different time. That number is five households for the entire sample, which is small relative to our sample of about 1,500 households. To be conservative, we additionally correct for the omission of these five households through bounding techniques. Not surprisingly, the results when applying the bounding techniques are not very different from the uncorrected results.
3. The within vs. across-village debate
Thus, the across-village comparisons in our first paper are not in strong doubt. Because spillovers are mostly small, this also means that the within-village treatment effects can be considered reasonable estimates of the treatment effect, with the benefit of higher statistical power.
The interpretation that we tried to "get away" with the within-village comparison in the first study to "prime" the reader for a particular interpretation of the long-term study is unreasonable. Not only was the short-term study long complete before the long-term study was written, but as Berk acknowledges, we presented both within- and across-village comparisons in the first study. The greater emphasis on the within-village results for reasons of statistical power was our best scientific judgment, and we still believe it is correct -- Berk also appears to have no concerns with this. The possible interpretation that we were interested in misleading readers is not justified.
4. The Haushofer, Reisinger, & Shapiro paper.
In 2015 we wrote another paper using the data from our short-term study. In that study, we used a different approach, comparing villages in which a larger share of households received transfers to villages in which few households received transfers. We used this different methodology because we were interested in how changes in village-level inequality might impact residents, and in the analysis discovered some evidence for negative spillovers (for specific forms of psychological well-being). Berk rightly notes that the spillover estimates are different from those in the short-term impact paper, which is a consequence of this different methodology. We find some signs of negative spillovers, which were reinforced in our long-term study.
We agree with Berk’s interpretation of the results. By this we mean that a) the long-term study shows some evidence of negative spillovers, even if it is not conclusive, and b) GiveDirectly’s blog post is selective and does not provide a balanced interpretation of the results presented in our paper. To those points of agreement, we would add:
- Attrition casts some doubt on the spillover results
2. Our study does not exist in a vacuum
It is essential that our results be considered alongside the (at least) 165 other robust studies of cash transfers. Policy decisions should not be based on the particular design or econometric specification of one study, or a blog post by a single charity.
Taking a step back, what do our results imply?
- Even if one accepts the least rosy interpretation, the results do not say that cash transfers aren’t “good.” Too much solid evidence shows positive impacts on consumption, nutrition, happiness, etc., and no robust increase in alcohol or tobacco consumption or a decline in labor force participation in low-income contexts.
- Our suggestive findings of spillovers raise important practical and ethical questions about operating with limited resources: do spillovers really “stop” at the village level? It is ok to increase inequality by helping some and not others? What if you hurt others, but help some even more? Though abstract, these are the sorts of questions every aid organization must explicitly or implicitly address, and our results reinforce that and perhaps provide some insight.
- Perhaps most importantly, the suggestion that cash transfers have limited long-term impacts invites a debate on the role of cash transfers: are they a “development” intervention, with the aim to boost people out of poverty with a single cash injection? Or an effective tool that must be used continuously to right the indignities of global inequality?
Berk Özler responds:
First of all, I thank the authors for reaching out to Development Impact to post a response. Not only did they explain various points about their work and give their interpretation of the evidence, but were also generous in voluntarily revising their original submission in response to a few comments from me. The end result is what I consider to be a good bookend to the debate that ensued over the past few weeks: as you can see, our posts have a lot of agreement – on methods and interpretation…
I would like to clarify only one thing and that is regarding the following paragraph that may be interpreted to be about (bad) intentions:
“The interpretation that we tried to "get away" with the within-village comparison in the first study to "prime" the reader for a particular interpretation of the long-term study is unreasonable. Not only was the short-term study long complete before the long-term study was written, but as Berk acknowledges, we presented both within- and across-village comparisons in the first study. The greater emphasis on the within-village results for reasons of statistical power was our best scientific judgment, and we still believe it is correct -- Berk also appears to have no concerns with this. The possible interpretation that we were interested in misleading readers is not justified.”
I did say that they “got away” with estimating ITT using within-village comparisons. I also said that the result of this was that future readers were “primed” to draw inference using the same definition. But, I did not say that Drs. Haushofer and Shapiro “did” the former to cause the latter. As they state above, that would have required them to time travel to 2018, see the three-year results, go back to 2015 and decide which estimand to use so that the results could be interpreted in the best light in the future. That would not only be a ridiculous suggestion, but it would be assigning malicious intent to the authors. In this blog, we think it best for the discussions to stick to the facts in evidence and not speculate about people’s unknown motivations for writing what they wrote. I am optimistic that few, if any, of our readers got the impression of a suggestion that the authors were trying to mislead readers, but if anyone has then they should dispense with it now.
We have a minor disagreement on the chosen method in HS (2016). The authors acknowledge that they have a cluster-RCT. It is not in dispute that the standard way to define ITT in a cluster-RCT is across villages. That’s why the within-village controls are called the “spillover group.” The authors are correct that within-village comparisons have at least as much statistical power than across-village ones, but at the cost of potential bias due to interference across individuals within villages. In this sense, it is similar to the OLS vs. IV tradeoff. This tradeoff is actually apparent in Appendix Table 38, where we can see the statistical significance (and even sign) of two indices change when switching from within-village to across-village estimates – not because of a drastic power loss, but rather due to sizeable changes in the coefficient estimates, which indicates some spillovers. The authors rule out spillover effects of greater than 0.22 SD in their short-term effects paper and deem that to be small, but reasonable and informed people could disagree. I recently had an editor ask me to cite the fact that we were not powered to detect 0.10 SD effects on our primary outcome as a “weakness of the study” in the study limitations section at the end of the paper.
But, the authors are absolutely right that I do not think this is a big deal for the HS (2016) paper. The within- and across-village findings that matter are quite similar to each other; the choice ends up being innocuous. That I personally would have based the main discussion on across-village estimates and retired the within-village ones into the appendix does NOT mean that the authors’ choice is incorrect: It’s their best scientific judgment and they’re standing by it.
This choice, however, did not end up being as innocuous for the interpretation by some of the longer-term effects presented in HS (2018) – even though the authors could not have easily foreseen this back in 2016. Had it not been for this choice, there would have been no need to include the following two sentences in the abstract of HS (2018):
“Using a randomized controlled trial, we find that transfer recipients have higher levels of asset holdings, consumption, food security and psychological well-being relative to non-recipients in the same village. The effects are similar in magnitude to those observed in a previous study nine months after the beginning of the program.”
The within-village comparison yields a completely different estimand in the presence of spillovers. And, later in the abstract, the authors state: “We do find some spillover effects.” So, presenting the within- and across-village estimates as legitimate alternatives to each other is what can cause the selective reading of the program impacts, as it happened in the original GiveDirectly blog post from February. These two estimands are not legitimate alternatives to each other in the three-year impacts paper: when they are not equal to each other, only one of them is the ITT effect, but the place to adjudicate that is not the abstract.
So, when I say the audience has been "primed" to think of the within-village estimates as the program impact, this is not an accusation to the authors. They could not have foreseen the future, and they cannot control what other people write or say about their work. But, a benign methodological choice in 2016 led to an abstract in 2018 that presents comparisons of within-village estimates as ITT over time, which may have led some to claim – incorrectly – that the effects in the longer-run are large and sustained. Perhaps, the abstracts of future versions of HS (2018) can be edited to make the main takeaways clearer.