Pardon the pun. But, psychological wellbeing has been in the news recently: do cash transfer programs have negative spillover effects on those who live near beneficiaries but do not receive transfers themselves?
The Economist covered a new paper by Haushofer, Reisinger, and Shapiro, dated October 28, 2015, with lightning speed on October 29 on its Free Exchange blog (Both links here got updated on October 31, after the authors quickly revised the paper to reference an earlier RCT with similar findings). David then linked to the Economist article in our weekly links, adding a link to a short post by Anke Hoeffler at the CSAE Blog, who argued that studies like this one that gave large sums of money to some eligible people and not others randomly were not ethical. David did not explicitly agree or disagree with Hoeffler’s position, but Jason Kerwin did disagree on his blog Ceteris Non Paribus – stating that this study was ethical to have conducted and that he’d like to see more of them.
What are the findings that caused this minor commotion? The authors study take advantage of the variation induced by GiveDirectly’s household-level randomization within treatment villages to examine the direct and spillover effects of increasing transfers to households on psychological wellbeing, as well as consumption, asset holdings (and some other things). As expected, people who receive transfers are better off, and higher the transfer, better the outcome. The spillovers effects on those eligible but did not receive transfers in treated villages are generally negative, but do not reach statistical significance when analyzed as an index. However, the life satisfaction among the spillover group declined significantly the more others in their community received. Similar results were found for assets and consumption. Most of these effects disappeared over time.
The striking thing about the findings is that the size of the spillover effects, regardless of statistical significance, are larger than the direct treatment effects. This means that if we had a properly sampled pure control group in untreated villages, and we compared all eligible households in treated villages with untreated villages, we would likely see a null effect.
The findings in this paper are believable, in the sense of causal identification, because the variation in treatment amounts at the village level comes from a combination of randomization of eligible households into treatment and control and of treatment households into small and large transfers. This combined with chance indivisibility of the number of households into two and the share of large transfers not being blocked by village, produces enough variation that can be used to identify the effects of increased transfers to villages on the welfare of those who did not receive any transfers.
I also find these findings believable, because we have an RCT that found similar effects among adolescent girls (see the article published in 2013 in the Journal of Human Resources here). There, with the use of randomly assigned clusters to cash transfers and no cash transfers and then offering transfers only to a subset of the eligible girls in treatment clusters, we found strong evidence of increased psychological distress among the spillover group, which were large enough to negate the beneficial effects of the transfers on the treated at the village level. Like the Kenya study, all (positive direct and negative spillover) effects dissipated quickly after the cash transfers stopped. In that paper, we used the GHQ-12, which is a screening instrument to detect individuals who are likely to have mental disorders and, hence, provides a measure of the common mental health problems of anxiety, depression, and social withdrawal. We did not assess life satisfaction, but there were no spillover effects – negative or positive – in many other domains, including schooling, marriage and fertility, sexually transmitted infections, leading us to conclude that welfare judgments were not clear cut because the beneficial effects of cash transfers (at the village level) on a large number outcomes had to be weighed against the large negative effects on the psychological wellbeing of a sizeable subgroup.
Now, I want to discuss the issue of whether this study design is ethical. Jason Kerwin made a lot of good points in his blog, with many of which I found myself agreeing. However, in arguing that this study was an ethical one to do, he invoked two principles, with both of which I have quibbles: additionality and uncertainty.
By “additionality,” Jason means that the study was designed to find the effects of large, lump-sum unconditional cash transfers on a bunch of outcomes. So, this study is just taking advantage of that RCT to answer a different question. But, Haushofer et al. did not have to randomize out any eligible households in treatment villages – you could just have treatment and control clusters and possibly avoid a large chunk of the negative spillovers that arise from people seeing their neighbors treated. By “uncertainty,” he is invoking clinical equipoise, i.e. that there is genuine uncertainty about the direct and indirect treatment effects of cash transfers on mental health and (components of) subjective welfare. He says that, in the absence of ex ante evidence (of which there was at least some), it’s reasonable to think that some will be happy for their beneficiary friend and some jealous: I agree – in our study, we found improved psychological wellbeing among siblings (possibly partly due to a household income effect) but large deterioration among eligible girls in neighboring households. But, imagine this following totally hypothetical scenario…
Suppose that a university department has hired two people in the same year that are more or less clones of each other. Let’s call them Jason Kerwin and Jayson Irwin (if you don’t like these names, replace them with Berk Özler and Ben Olzer). Three years into their seven-year tenure clock, the university informs Jayson that he has been promoted to associate professor with tenure. Jason is told, if he is lucky, that he is still on the same path to promotion as before. In fact, the university promotes another assistant professor directly to full professor. How does Jason feel now? Perhaps he is happy for his colleagues that joined the department with him, but I find it more plausible that he is confused and is asking what he has done wrong. A year later, if he has not left the university, he might tell enumerators to buzz off when they show up to ask for permission to swab inside his mouth.
Suppose instead that the University forms a committee for an accelerated tenure program and imposes a formula to create a score for every assistant professor: 10 points for a top-5 publication, 5 points for any other journal article, 4 points for teaching and 2 points for service. Jason is now told that he just missed out on the cutoff whereas Jayson was right above it. Jason might still not be able to avoid feeling hurt, but he might also understand: the process was transparent even if he did not agree with the exact formula and the Dean did not count his forthcoming paper. It could have helped if Jason’s department was running the program itself (as were all other departments) and the people making the final decisions were a committee of randomly selected senior faculty members. Jason himself would have had input into the formula and the challenge procedures, and may accept the outcome much more readily.
Obviously, I am being facetious here, but my hypothetical academic interventions have very real-life counterparts for transfer programs. Replace the random promotion to associate (full) professor with small (large) cash transfers in an RCT where beneficiaries are randomly selected from the pool of eligible households. Continue to replace the university- and department-level targeting for accelerated tenure with government antipoverty programs that utilize centrally operated (proxy-) means tests and decentralized participatory wealth ranking schemes to determine program eligibility. So, the real question is whether the findings of negative spillovers from Malawi and Kenya are due to the fact that subjects were randomly excluded from the program, or do those who barely miss out on eligibility for important government programs also suffer similarly?
It’s obviously hard to say how many studies like these are enough before we have a pretty good idea that an intervention is likely to have some negative consequences for a large enough subgroup – enough to override the value of the knowledge generated. But, if you ask me, given what we know now, I’d personally be loath to randomize treatment assignment across eligible individuals within localities. I have argued as much recently in the context of prospective study designs for evaluating transfer schemes. This is despite the fact that I don’t know how useful a concept life satisfaction is or whether temporary psychological distress has lasting consequences. For me, there is sufficient combination of theory and evidence to question such individually randomized study designs. If I was a reviewer for a proposal with such a design, I’d ask tough questions of the researchers to ascertain whether figuring out the answer to their question is really worth this risk. Even if so, I’d ask whether we could not answer it in some other way – perhaps equally valid internally but a bit more expensive or difficult. It seems that even GiveDirectly agrees: Haushofer et al. suggest that GiveDirectly has now moved to a model in which all eligible households in a village will receive transfers rather than a subset. That’s good…
This brings me to real government programs that are targeted to small subsets of poor households and whether these effects are there as well. I think the way to study this question is using the traditional partial population experiment: if, say, as in PROGRESA/Oportunidades, there are treatment and control villages and within each village, everyone has a program score that determines eligibility, then I can examine effects on people just above the cutoff by comparing them to the same group in control communities. This has been done for economic outcomes, but it’s rarer to see it for mental health or subjective wellbeing. If we find that people barely missing out on large government programs do suffer negative externalities (that are not compensated by positive spillovers), this might be a signal for governments that there are perhaps small design tweaks that can avoid these unwanted effects. For example, instead of beneficiaries below a certain cutoff getting very generous benefits while those above get nothing, the transfer schedule could be smoother across poor households.
There is also current interest in figuring out general equilibrium effects – by varying the intensity of treatment across clusters. Again, in such cases, within-cluster randomization is not absolutely necessary: you can simply make your eligibility criterion more generous, where only the extreme poor are eligible in some areas, the poor in others, the poor and the vulnerable in yet other areas. This may not answer exactly the same question as an individually randomized trial, but it might be a very good compromise.
There are real reasons why cash transfer programs might have negative spillovers for non-beneficiaries in treatment areas. For example, in some national transfer programs, some villages will be so poor that almost everyone (but not everyone) qualifies as beneficiaries. In such cases, subjective welfare (poverty) of the few excluded households may really suffer. Similarly, when a large percentage of a village is being treated with cash transfers, prices of certain foodstuffs may increase (see this paper by Cunha, De Georgi, and Jayachandran 2015) hurting the ineligible – not just their mental health or life satisfaction, but also their pocketbooks and possibly their physical health. Many other programs may have displacement effects – someone trained may take away a job from someone else: the point is not that spillovers are not important – they very much are. The point is what kinds of studies we design to study them. Each research question will have to be assessed on its own right to decide whether within-cluster randomization is appropriate or not – sometimes they will be and other times they won’t.
That’s why I think Anke Hoeffler was right to raise the question of whether study designs that create larger and random inequalities between eligible households in the same small rural communities are ethical. The studies from Malawi and Kenya discussed here raise the bar for any future studies that propose similar designs.
------------------------------------------
Three additional thoughts that emanate from the Haushofer et al. (2015) discussed above and Jason Kerwin’s blog post:
- I am not sure what this latest paper implies with respect to the original evaluation of the effects of GiveDirectly transfers by Haushofer and Shapiro (2013). That paper used the within-village controls as the control group to estimate direct treatment effects, arguing that there was “little evidence of negative spillovers, as discussed below; this includes psychological well-being, i.e. untreated households in treatment villages did not experience a decrease in psychological well-being.” [page 10, footnote 5] The 2015 paper now finds negative spillovers on not only life satisfaction but also on assets and consumption. So, does that not mean that the 2013 findings were biased upwards – i.e. overestimates of the true program effects?
- Jason Kerwin pays some attention to the specification in Haushofer et al. (2015) that examines the effect of the Gini index on the individual outcomes of interest – controlling for individual’s own transfer receipt and the mean level of transfers at the village level and argues that it is not inequality that causes stress: “If one’s neighbors are richer, it does not matter if the money is spread evenly among them or concentrated at the top.” I don’t understand this argument; the inequality-health relationship is meaningful for a mean-preserving change in inequality. But, there is no such change in the GiveDirectly experiment: one instrument – cash transfers small or large – increase both the mean and inequality. They go hand in hand: it’s not like we have manipulations in which mean goes up but inequality does not or vice versa. So, the effect of inequality conditional on the mean is really confusing – the increase in the mean in this experiment is an increase in inequality. Conditional on the mean, where is the variation in the change in inequality coming from? From baseline heterogeneity it seems. I don’t think specification is useful or its interpretation correct. [Furthermore, Gini is not a good index of inequality if one is interested in changes in concentrations at the top of the income distribution.]
- The author of the Economist article states: “As expected, those who received transfers reported greater satisfaction with their lot after the money arrived. Cortisol levels and the incidence of depression fell too.” This is wrong: no overall treatment effects are found on cortisol levels in this paper or in the 2013 paper.
* Update: Paul Niehaus, a co-founder of GiveDirectly, commented below to clarify their practices with respect to cluster vs. household randomization.
Join the Conversation