Syndicate content

Lessons from a cash benchmarking evaluation: Authors' version

This is a guest post by Craig McIntosh and Andrew Zeitlin.

We are grateful to have this chance to speak about our experiences with USAID's pilot of benchmarking its traditional development assistance using unconditional cash transfers. Along with the companion benchmarking study that is still in the field (that one comparing a youth workforce readiness to cash) we have spent the past two and a half years working to design these head-to-head studies, and are glad to have a chance to reflect on the process. These are complex studies with many stakeholders and lots of collective agreements over communications, and our report to USAID, released yesterday, reflects that. Here, we convey our personal impressions as researchers involved in the studies.

The Gikuriro cash benchmarking evaluation

The report released this week presents the core results of the first of these benchmarking studies, which compared a relatively standard multi-sectoral USAID child nutrition program against cash transfers. The USAID program, called Gikuriro, provided WASH and behavior change trainings around sanitation and nutrition, delivered productive inputs through farmer field schools, and help beneficiaries set up local savings and lending groups. We ‘benchmarked’ this program against cash by conducting an ex-ante costing of both programs, and then randomizing the cash transfer amounts in a range around the expected cost of Gikuriro. By using regression adjustment in transfer amounts we can then estimate the impact of GiveDirectly at the exact cost of Gikuriro revealed by the costing at the end of the study. And because cash need not be at its most cost effective at these values, our study also includes a larger transfer amount, costing $532 per household, the value of which was chosen by GiveDirectly.

To begin with the results:  the main takeaway is that neither intervention (when evaluated at the low Gikuriro cost of $141 per household) improved child outcomes. The bundled intervention was relatively inexpensive compared to similar programs elsewhere, and this low overall level of spending makes the impact of both programs undetectable on child outcomes. There are interesting differential effects of the programs; the heavy savings focus of Gikuriro did significantly boost this outcome, but individuals receiving cost equivalent cash transfers instead used the additional resources to pay down debt. Cash transfers led to meaningfully better improvements in productive and consumption assets, while Gikuriro improved knowledge about health practices at the village level. So, while neither program was effective in the short term, each has impacts visible at the time of the endline that could be hoped to drive improved outcomes over the longer term.

The intervention that did move child outcomes was a substantially more expensive cash transfer, costing more than four times as much money and actually delivering $532 per household. This transfer led to across the board increases in consumption, assets, and housing value, but more excitingly translated consumption impacts into improvements in dietary diversity, modest and marginally significant improvements in anthropometric outcomes (~.1 SD), and a decrease in child mortality. So we have evidence here that large cash transfers can move child health outcomes, but the study does not speak to what Gikuriro would have achieved if it had spent this larger amount of money.

So, we think the conclusion from our results should not be as simple as ‘cash won’. The study is explicitly not built to directly compare large cash transfers to Gikuriro (which is not a strict benchmarking comparison, as we explain below). Such adversarial framing does a disservice to both the openness of participating organizations to learn, and to the state of knowledge. It took courage and a commitment to transparency for the implementers involved in this study to participate, and we are deeply grateful to CRS, GiveDirectly, USAID, and IPA for sticking with the execution of the study.

Instead, our takeaway is that different means of spending program resources generate fundamentally different types of benefits, and with a more nuanced understanding of these differences we can design programs that are better tailored to deliver specific impacts. For the primary outcomes, impacts per dollar spent are highest in the large transfer arm, but that is only an interesting comparison to Gikuriro to the extent that Gikuriro is optimally sized for cost effectiveness -- and while this was presumably practitioners' ex ante belief, results from our study suggest that belief may be worth revisiting. As we understand more about the outcomes that cash transfers can and cannot deliver, we improve the ability of USAID and other implementers to identify a) the means of delivering resources, and b) the concentration of resources across individuals that is most cost effective. Head-to-head studies are invaluable in allowing us to make these kinds of comparisons.

Broader lessons

Reflecting on the experience and results of our evaluation, we think there are five broad lessons worth sharing.

1. What we mean when we talk about 'benchmarking'

As first conceived of by Chris Blattman and Paul Niehaus, the basic idea of benchmarking development interventions with cash was to provide what might act as a low-cost 'index fund' for investments in development. Technological improvements in the ability to distribute cash to poor households in developing countries have gone through a sea change, and the availability of mobile money channels revolutionizes the efficiency with which cash transfers can be distributed. Much as the advent of low-cost mutual index funds exerted healthy pressure on portfolio managers to justify their fees, low-overhead cash transfers can pose fundamental questions about the value added by aid programs that deliver goods and services in kind.

To operationalize this idea requires careful thinking about the outcomes to study, the population to target, and the value of transfers to use. By contrast with mutual funds, typical development programs target, and value, a wide range of outcomes. A 'strict' definition of benchmarking would study relative impacts on these outcomes holding the beneficiary population constant across arms -- meaning cash would have to follow the targeting rule of in-kind programming -- and holding resources spent per beneficiary constant as well. This isn't necessarily the way that one would design a cash transfer program to maximize cost effectiveness; in our study, cash appears to be more cost effective at larger resource investments. And it's possible that, for a given, broader beneficiary population, cash could do better by targeting differently. More flexible notions of benchmarking may allow departures in these dimensions, but stronger normative judgments will then be needed to weigh, e.g., the merits of providing greater assistance to a smaller number of people.

2. Cash benchmarking pushes donors to justify paternalism in development programs.

Given the many outcomes valued by development programs, in general, no one approach will dominate on all dimensions. When this is the case, benchmarking studies can then serve two purposes: to highlight the tradeoffs across profiles of outcomes between alternative programs, and -- we hope -- to encourage practitioners to think carefully about the tradeoffs that beneficiaries themselves would prefer to make across these outcomes.

As Das, Do and Özler pointed out in 2005, in the absence of external market imperfections, intra-household bargaining concerns, or behavioral inconsistencies, the outcomes moved by cash transfers are by definition those that maximize welfare impacts. Under these (idealized) circumstances, cash is useful as a benchmark not just because it is inexpensive to deliver, but because it reveals the ways that households themselves choose to change their behavior when an absolute income constraint is relaxed.

Markets are not perfect in the real world. Reasonable arguments can certainly be made about how such imperfections are addressed by in-kind programs; for example, there is direct evidence of such imperfections in some settings (e.g., Cunha, De Giorgi, and Jaychandran 2018). Comparing the profile of impacts delivered by cash to the profile of impacts delivered by in-kind programs requires donors to be explicit about the extent to which they either have different preferences than beneficiaries, or to which they believe beneficiaries are constrained by market imperfections in the expression of their own preferences.

3. Getting the costs right is challenging, and important.

Our 'strict' definition of benchmarking requires holding program costs constant across arms, in dollar-at-the-top terms. This contrasts with, e.g., the extant literature on cash vs kind in food support, which has mostly focused on how household responses to transfers that have the same nominal value to the recipient depend on the modality of the transfer. It therefore puts the costing question at the center of the experimental design, whereas many have lamented that reporting of costs in impact evaluations is sometimes an afterthought and varied in quality.

We took two steps to try get this right. In doing so, our guiding principles were that a) the costing should be as symmetric as possible across implementers, and b) we should be costing only those components of the programs whose benefits we are able to measure.

First, we costed both programs in both ex ante and ex post terms -- an exercise that would not have been possible without enormous cooperation from both implementers, and superb help from Liz Brown, a costing expert working neutrally on behalf for USAID and the research program. In the case of Gikuriro, a central challenge was to isolate costs associated with those program elements that were assignable at the village level; these are the program elements whose causal effects we identify. We then had to attribute a share of administrative and overhead costs to these program elements. Since we estimate intention-to-treat effects, we had to work out the effects of non-compliance on cost per eligible household (some costs are averted by non-compliance, some are not).  Because Gikuriro was a national-scale program, to make the comparison with GD fair we synthetically scaled up GiveDirectly's cost structure to a level of implementation equivalent to Gikuriro's full target population. Fixed costs decline with program size, so a level playing field requires us to evaluate both programs at equivalent scales.

Second, we addressed the uncertainty in projected costs by randomizing transfer values in a range around the projected costs of the in-kind transfer. In our pre-specified analysis, we combine this experimental variation in cost per beneficiary with a regression adjustment in order to make comparisons at cost-equivalent transfer values. Some assumptions are required here, and alternatives approaches are certainly possible. But we think more work to develop study designs addressing these issues would be valuable.

4. The time profile of delivery, and impacts, may differ by modality.

Different types of intervention, by their nature, roll out at different paces. In the case of our study, implementers in all arms of the study contacted study-eligible beneficiaries to enroll them nearly immediately after baseline. Monthly cash transfers started to flow shortly thereafter. But Gikuriro's in-kind services took longer to launch. Whether one thinks of this as a feature or a bug of this study is a judgment call. Given that beneficiaries were selected on the basis of having children at risk of acute malnutrition in a critical window of their development, we think there is a strong case that the differential speed of program benefits is a relevant part of the estimand.

These alternative modalities also may differ in the time profile of their impacts. In our study, pressure to implement the program universally in the study districts, as well as ethical concerns for control households containing malnourished children who were not getting access to the programs, meant that the entire group of study villages received the Gikuriro program immediately after the 13-month endline. To address a concern for long-term effects, we included a number of outcomes that we, and the other parties involved, thought would respond quickly and embed any long-term impacts of the in-kind transfers (see Athey, Chetty, Imbens, and Kang 2016 on the use of 'surrogates' for long-term impact). In this particular case, the fact that we don't see movement in knowledge, diet, or anemia is discouraging about the prospect of longer-term impacts from Gikuriro's investment in study households. On the other hand, there is reason to believe that human capital gains reflected in reductions in stunting may have persistent effects.

5. Cash and in-kind programs may have very different effects on the broader population.

An outstanding issue for cash benchmarking as a framework is to resolve how it should think about their effects on people outside their narrow target population. Typical development programs place little weight on these other individuals, and a strictly defined benchmarking exercise might focus exclusively on consequences for its target population.

Recent studies have highlighted the potential for external effects of cash-transfer programs. In our own work the point estimates on village-level impacts are consistent with negative spillovers of the large transfer on some outcomes (they are also consistent with Gikuriro’s village-level health and nutrition trainings having improved health knowledge in the overall population). Cash may look less good as one thinks of welfare impacts on a more broadly defined population. Donors weighing cash-vs-kind decisions will need to decide how much weight to put on non-targeted populations, and to consider the accumulated evidence on external consequences.

Closing thoughts:  Where and when to undertake cash benchmarking to maximize learning

On the implementation side, we can certainly attest that these benchmarking studies are a challenge!  It is complicated enough to design a single RCT with a well-defined eligible group within which we may hope to find impacts. To work with two implementers, each with different implementation timing and strategies as well as different targeting criteria and compliance rates, is a real challenge.

Because of these challenges, and because resources are scarce, we do not emerge from this pair of head-to-head studies feeling that it makes sense to try to impose benchmarking as a blanket way of evaluating development programs. Rather, if a series of such studies can be conducted on comparative impact and the results can be generalized, we would hope that the impacts of cash will be consistent enough across the developing world -- or that we can learn enough about patterns of differential impact across contexts -- that it will be possible to make relatively precise projections of the impacts of cash for many settings.

USAID has a number of studies ongoing in Malawi, Liberia, and the DRC to try to nail down whether there are such consistent, cross-country generalizations that can be made about the effects of cash transfers. Ideally these strategic investments in knowledge about the comparative cost effectiveness of cash transfers in a range of contexts will contribute to better programmatic decisions. We hope that the results of our own study will be used in a manner that encourages more organizations to engage in such comparative cost effectiveness research.

Add new comment