Some thoughts on the Give Directly Impact Evaluation

|

This page in:

On Friday, the first evaluation of Give Directly was released, and covered in the Economist and NPR among others. As discussed in a previous blogpost, Give Directly makes unconditional cash transfers to households in Kenya who are targeted on the basis of whether or not they live in a thatch-roofed house in Kenya. The first findings look at the short-term impact of these grants, either when households are still receiving them, or within a few months of having received them.

The basic findings are encouraging from the point of view of charitable donations – if you want to feel assured that money given to Give Directly is immediately having a positive effect on the lives of poor people, this evaluation has plenty for you: households upgrade their thatch roofs to metal, build up livestock, consume more food, increase small business income, and are happier and less stressed. And for those whose views about how the poor spend money are shaped by stereotypes of the homeless in the U.S., there isn’t any significant increase in spending on alcohol, tobacco or gambling. So there is  much more evidence of positive effects than for the vast majority of charities out there.

Of course for many people, the fact that giving people a bunch of money today (US$300 for most receipients, which is about 2 months of household non-durable consumption) leads to households being better off today is not really surprising, and so their questions turn to three things: a) Are these effects long-lasting? b) does this have negative effects on those neighbors who don’t receive the grants; and c) when all this money goes into small villages, doesn’t it just push up prices. What does the evaluation have to say about this? On a), obviously we need to wait for the long-term follow-up studies to really know, but there is a bit either way here – households are building some business assets, but there aren’t any significant effects in the health and education investment domains (they also have a little on short-term dynamics); on b) and c), the study is designed to measure these spillovers, randomizing first at the village level to have some pure control villages, and then within villages as to who is treated. They don’t find any evidence of negative spillovers on neighbors, and find no significant changes in village prices, wages or crime.

Things of interest (and possible concern) to impact evaluators
The evaluation is carried out by Johannes Haushofer and Jeremy Shapiro, and does a lot of things really well:
 
  • They are careful to note that Jeremy is a co-founder and former director of Give Directly – such disclosure is important when people who have a vested in the results are involved in the evaluation.
  • The evaluation design and pre-analysis plan are pre-registered at the AEA registry. This includes procedures to deal with testing impacts on multiple outcomes.
  • Rather than just asking whether the program is working or not, they use the evaluation to try and test different ways of doing it, so that the evaluation can help guide making it better. To this end they i) randomize whether the money is given to the husband vs the wife; ii) randomize whether individuals get the money as a single lump-sum payment vs as a stream of monthly payments for 9 months; and iii) test sensitivity to the level of the transfer by giving one group approximately $300 (small transfer) and another $1100 (large transfer). This leads to lots of different treatment groups as shown below:So what might we be concerned about?
     
  • Self-reporting:  This is an issue for every study I know of which has consumption as one of its main outcomes. A separate survey team did the interviewing from the intervention organization, but it is likely the two were linked in many minds. The concern is of course that after money has just dropped out of the sky onto some households that the treated households may feel like they should over-report well-being and consumption and underreport uses they think might be not approved of, while the control group might have incentives to underreport consumption in the hopes of receiving transfers in the future. I feel like this concern is larger when surveys take place close in time to the grants (as here), and when the grant is given as a charity transfer/gift rather than as a prize or other windfall. One possible solution is to increase the amount of objective measures – they take salvia samples here to measure cortisol, a stress indicator, as well as take anthropometric measures, but don’t see much impact on these despite reported food intakes increasing. In future follow-ups perhaps they can physically verify certain asset holdings (certainly the roof, maybe also cattle) to make these less susceptible to reporting bias.
  • The other (complementary) approach is to give reasons to think this may not be so important: Johannes notes a couple of these reasons in an email to me: i) you might think people would over-report health and education spending since they would think donors like this, but the lack of any impact here suggests this isn’t dramatic (of course it is also consistent with the program having a negative impact on health and education spending and overreporting taking this to a nil effect, but this seems less likely); ii) the large transfers were a surprise to households and came later than everything else being announced, but villages which got more surprise large transfers don’t have differential reporting.
     
  • Power: The downside of trying to test so many different variants of the program is that the power to distinguish between them can be somewhat low. As seen above, they only have 500 treatment households, but effectively have 3 types of treatment (so 8 combinations). A first concern is whether there are interactions between the treatments, something that is not tested. Second, even assuming linearity, the power can be somewhat low, especially for outcomes like expenditure, which can be noisy. For example, the main treatment effect on non-durable expenditure is $US36, with a standard error of $6. The female recipient effect is -$2, with a standard error of $10. So they can’t rule out that a female head has less than half the size effect as a male head, nor that a female head has 50% higher treatment effect than a male head. Similar wide confidence intervals are found for business revenues and non-land assets – as a result, the studies are not as informative as one would like about design choices because they are perhaps trying to do too much.
Bottom line though is that this is a very well-designed study, and we would love love love to see more charitable programs (and government programs!) evaluated to the same standard of rigor. I look forward to seeing the longer term follow-up results.
 
  • Update: Since writing this on Friday, I see Chris Blattman makes some similar points about self-reporting.
     
Countries

Authors

David McKenzie

Lead Economist, Development Research Group, World Bank

Drew
October 31, 2013

The three R's have it: Registration, Reporting and Regulation.
Registration: Registration of a Pre-Analysis plan after data collection has been concluded is hardly good practice. In this instance the data was collected between May 2011 and January 2013, while the study was registered on June 28, 2013. Yes the AEA Registry was not there but the ClinicalTrials.gov has been in existence over a decade, and the lead author was funded by the NIH that requires this reporting
Looking at an earlier RCT prepared in June 2011, the evaluation questions including benchmarking UCTs with CCTs as well as in-kind transfers plus health insurance. However the pre-analysis plan prepared in 2013 discusses none this. Conversely the violence to female household heads, community disquiet etc etc do not find any mention in the earlier version. This not unusual. Hence the need for ex ante registration prior to data collection to allow for the viewer to see the evolution of the study design over the course of the implementation.
How did any IRB approve of an assessment undertaken by co-PI/author who is also one of the four initial founders of this start-up? Mr. Shapiro should have recused himself at the outset
Finally Give Directly is operating a money remittance provider (MRP)in Kenya it needs to follow the regulatory guidelines laid down by the Central Bank of Kenya. An important component is Know Your Customer (KYC) that requires detailed record keeping and internal controls that would be inspected by the agency tasked with oversight before being issued a license.
On the other hand if the organization is a social work agency then it would need to state what legal responsibility if any it will bear for harm caused to a recipient through its cash transfer. Their blog on the case of young woman Christine, http://www.givedirectly.org/blog_post.php?id=6234197500075333847, is a reminder of this real possibility.

Jenny Aker
October 28, 2013

David,
Thanks to you (and Chris B) for blogging about this. I agree that the evaluators have designed a nice experiment, and done a great job thinking through some of these important questions related to UCTs. I have three additional thoughts about this:
1. How should we think about these spillover effects? You mention that they don't find evidence of negative spillovers. I might have read this too quickly, but while it's true that a majority of the variables don't show negative spillover effects, I do think that the spillover effects merit further attention by the authors. There are several cases of negative and statistically significant spillover effects - on the value of particular assets, social expenditures, livestock revenues. Now whether these effects are due to multiple hypothesis testing or true negative spillovers is unclear, but I think that they merit further attention -- especially as several of them are related to control households' asset values, which suggest that perhaps there are some general equilibrium effects that the village price analysis isn't capturing. (In addition, for some of them, the magnitude is quite large - ie, as large as the treatment effect, unless I'm missing something). This is particularly important in an intervention such as this one, where eligible households weren't chosen for the treatment (and the lottery wasn't done publicly, although it was announced).
2. Second, are village prices the right level to be thinking about inflationary effects of cash transfers? I know that many impact evaluations often focus on these, but in many rural areas of sub-Saharan Africa, most purchases occur outside of the village at weekly agricultural markets. While village-level prices are important, and might capture one aspect of inflationary effects of cash transfers, it seems as if a relevant level of impact is on prices at the market level (which might or might not be integrated with village prices).
3. My third point is related to all impact evaluations, but something I think it's important to keep in mind. More often than not, our impact evaluations don't address the primary issue: Was this the right intervention for the problem in the first place? Yes, UCTs have worked in this context, but compared to what? (Getting back to the NPR piece, would Heifer International's intervention have done better, worse or the same in addressing poverty here)? Obviously that wasn't the objective or focus of this impact evaluation, but I am often surprised by how often we assume that we've chosen the right intervention for the problem (and any evidence of impact is proof of this). Finding that cash transfers had an impact isn't proof that cash transfers were the right intervention in the first place. Obviously this is difficult issue to resolve, but I would love for us all to have more discussion on how we identify and analyze the problem and design interventions in the first place, even before we get to testing whether (and why) they worked.

Alberto
October 30, 2013

Targeting on one observable? Hi David. Interesting post and debate, as usual. I don't seem to see any discussion on the targeting criteria (roof material), which seems problematic to me - but maybe I haven't looked at the documentation closely enough. Isn't the criterion providing a perverse incentive? It seems some people have changed their roof material as a result of receiving the cash, but does that mean they automatically graduate from the program in subsequent rounds? That must be providing some signal to households on what not to do with their money, in a way that may result in perverse outcomes.

A. Salomon
October 29, 2013

In regards to the treatment diagram, is this indicating no baseline survey for pure controls? If so, why was this group left out of the baseline?
I'm wondering how they plan to determine whether there are spillover effects if they don't have a difference estimator for the pure control.