Published on Development Impact

Poverty Reduction: Sorting Through the Hype

This page in:
After seeing PowerPoint slides of the preliminary findings over the course of more than a year, it’s nice to be able to report that the six-country study that is evaluating the “ultra-poor graduation” approach (originally associated with BRAC) is finally out. The findings look impressive and there will be the inevitable short news cycle and excitement about the findings, so a discussion of the strengths and weaknesses of the study (for the purpose of sustainable poverty reduction in developing countries) and some takeaways for policymakers designing social protection/promotion and productive inclusion programs might be helpful.
You should read the paper, which, as an article in Science, is not that long, but here is a very brief summary. The program in the study targets the ultra-poor (I’ll refer to it as TUP from hereon) and works with the selected households from the communities intensely for about 24 months. They work with the households to identify an income-generating activity, provide the main productive asset as a transfer (grant), give monthly cash as stipends to protect the productive assets, pay frequent follow-up visits to the beneficiaries, and help with financial inclusion, etc. The key is that while it is a one-shot intervention that stops, it does actually last somewhat intensely for a couple of years – this is important. The programs are generally more expensive than microcredit programs because they give grants rather than loans and spend a fair amount on additional services, but the idea is to have a sustained impact after the program stops. Furthermore, they are targeted at individual households rather than groups of them – as is the case in some other programs.
The study focuses on 10 important domains these programs can have effects on, including consumption, income, asset accumulation, employment, financial inclusion, health, political involvement, and women’s empowerment. It looks at effects immediately at the end of the two-year program, i.e. just after all program support has ceased, and examines them again one year later. Table 3 summarizes the overall findings: there are significant improvements on eight of the 10 domains at the end of the program, and seven a year later. In standardized terms, the effects are not overwhelming, ranging between 0.05 and 0.25 standard deviations, so these households are still poor, but less so than before.
What are the strengths and the weaknesses of the study? Let’s start with what’s good:
  • Multiple settings: The evidence from BRAC’s implementation of the same type of program in Bangladesh has been around for a while (see Bandiera et al. 2013). This study takes the same concept to six other countries, works with local NGOs in each case, and provides a proof of concept that it can work elsewhere.
  • Measuring multiple outcomes: The authors cover many of the relevant measures for welfare improvements and necessary conditions for sustained poverty reduction. They not only measure income, employment, assets, financial inclusion, and consumption, but also health, female empowerment, and social inclusion.
  • The timeline makes sense: The intervention is specifically about permanently changing something for the current generation so that these households are not (or at least less) dependent on similar programs in the future. So, the intervention has a beginning and an end: it’s an opportunity rope rather than a safety net as a dear colleague likes to call it. The evaluation similarly recognizes that examining impacts immediately at the end of the program does not make a lot of sense on its own: what we care about is sustained impacts on welfare – in the absence of this support. Given that we have so few programs that clearly show sustained effects on poverty reduction, this is really important.
  • Nice discussions of implications and limitations: The authors discuss a number of interesting things about their study. For example, they are frank about the fact that this is a package and we don’t quite yet know which parts of the package are more important than others: that would require more experimentation. They also have a nice discussion comparing their findings with that of GiveDirectly, which is the exact opposite approach to TUP if you will: take all the money that TUP is spending on assets, training, follow-ups, etc. and give it to the household in cash (perhaps even in just one lump-sum via mobile phone: cheap as). There is suggestive evidence that the TUP effects might be more lasting than lump-sum cash, but the comparison is a bit apples and oranges at this point: different countries, target groups, amounts, etc. But, it’s better than nothing as these comparisons are what occupy some policymakers’ minds. The discussion of externalities and general equilibrium effects is also welcome.
Now, for the limitations that won’t get as much play in the coverage:
  • Small samples in each country: the number of households receiving treatment in each country is tiny, somewhere between 450 and 800. One complaint that one hears from practitioners in governments is that it is one thing for one NGO to intensely focus in one area and deliver a package of services under one roof while it is quite another for a government to coordinate the delivery of such services (perhaps across multiple line ministries, as well as local providers) at scale for millions of people.
  • Small numbers of households in each community: the number of households treated in each village is as small as 4 (yes, four) going up to a maximum of 20. These must constitute a tiny share of the total number of households in these communities. The authors downplay the possibility of general equilibrium effects (for example, the diminishing returns to the same activity as the number of people engaging in that activity within a small area increase), saying that these programs would expand more geographically rather than within villages. I disagree: the fact that there is within-village randomization of eligible households tells me that we can at least double the number of households treated per community. Furthermore, if this program will help, and I am making up this number, only 3-4% of the population, can it be that useful as a primary poverty reduction program even if it is cost-effective?
  • Targeting: the previous point brings up another important issue, which is the definition of eligible beneficiaries. The NGOs in each country applied different eligibility criteria for selecting the potential beneficiaries. While governments do use targeting criteria for various, although often fractured, programs the ultimate goal is to reduce poverty. So, our aim would be to find a program (or a set of programs) that would work for a simpler targeting criterion, i.e. for the bottom X% -- you can pick X to be 10 or 25 or even higher. We don’t know what the effect of the exact targeting methods employed on the success of TUP in this study, but the question remains as to whether governments could reproduce similar success with larger target groups…
  • Negative Spillovers: A non-trivial aspect of the study design is that the treatment and the control households live in the same communities in all six countries. Therefore, the main findings are reporting differences between randomized-in and -out households within villages. In three of the six countries, however, there was multi-level randomization – first at the village level, then at the household level – that allows the authors to examine spillovers in these three countries. Using this subset, the authors discuss spillovers on page 12 of the article and state that spillover effects do not substantially affect the results. However, this statement is based on the question: “Are there significant spillover effects?” – the answer to which is (generally) “no.” However, we should be asking a related but different question: “Are spillover effects influencing the inference that can be drawn?” The answer to this question is much more up for debate and I’d say “yes.” Remember that in Table 3, we had significant effects on seven out of the 10 domains one year after the end of the program. In Table S6b, this number is down to two: financial inclusion and incomes and revenues. The effects on consumption, food security, assets, work, and health are insignificant – this is not only because of the loss of power: there are enough signs negative spillovers in both endlines (the one on mental health is particularly striking, as we have also found such negative effects for the randomized-out group in this paper). It’s great that the 0.2 SD effect on income and revenues persists one year after the end of all support, but if Table S6B was the main table for this study, we’d be talking much less about it. The fact that the three countries in which the reported effects are the largest (Ethiopia, India, and Pakistan) are also the ones with only household-level randomization, i.e. for which we cannot estimate spillover effects, is not ideal.
  • Long-term effects: While the authors should be commended for examining effects one year after the cessation of all support, this time frame is still too short. We’d like to see these effects last longer to be able to confidently redirect funds in this direction.
  • Heterogeneity of effects: If a policymaker was contemplating adopting this approach in her country, what rate of success could she expect? How does the heterogeneity within countries compare with the heterogeneity across them? In one of the six countries (Honduras), the program did not work at all. A second RCT from another site in India was not included in this meta-study (due to data comparability issues), but footnote 10 states that there was no impact. In fact, originally there were 10 sites, four of which, for one reason or another, did not make it into this paper (detailed nicely in footnote 10). Thinking about the heterogeneity of effects across countries and the probability of failure in case of adoption do not take away from the impressive average impacts in the meta-analysis here, but it is what policymakers have to do.
So, what should be some takeaways for governments designing programs for poverty reduction?
  1. The findings should be treated at least as intriguing and promising for our efforts in trying to identify programs that have persistent effects on earnings and consumption. Governments would be well advised to study the findings, debate the details and the relevance to local contexts, and how such an approach may aid them in designing more effective social protection programs. Many governments have systems providing social safety nets to the poor, but programs that put the poor on a sustained growth path out of poverty are not yet in evidence.
  2. However, we need to also see whether governments can adopt at least elements of this model and make it succeed in having persistent effects. So, the next step is experimentation at scale.
  3. Governments will also have to decide whether this approach can work for larger numbers of the poor. If not, they would need to devise a series of graduation programs, whereby households graduate from one into the other, needing different services but less assistance in each stage.
  4. We also need experimentation with elements of the TUP package: this could improve cost-effectiveness if certain components are nice but not essential to success.
  5. We need longer-term evaluations, as escaping poverty is neither easy nor quick. Many studies have their short-term findings overturned in the longer-run, so tracking outcomes over the short-, medium-, and long-term is important.
  6. Finally, in their experimentation, governments should not be afraid to try different approaches simultaneously. As we legitimately do not know what works best, having two or three promising approaches “compete” against each other at scale and on a level-playing field could make sense. The more countries that generate such evidence, the higher our confidence will be in approaches that work and those that don’t.


Berk Özler

Lead Economist, Development Research Group, World Bank

Join the Conversation

The content of this field is kept private and will not be shown publicly
Remaining characters: 1000