A different take on “Targeting the Ultra-Poor” programs


This page in:

Almost exactly four years ago, I wrote a blog post, titled “Poverty Reduction: Sorting Through the Hype,” which described the paper by Banerjee et al. (2015) in Science on the impacts of the ultra-poor graduation approach, originally associated with BRAC in Bangladesh, in six countries. Now comes a new paper by Naila Kabeer, which reports the findings from a qualitative evaluation, which was conducted in two of the six study sites. The paper aims to provide a different perspective to the RCT by digging deeper into issues regarding implementation (including random assignment in the RCT), refusal of take-up, and mediation of effects (or lack thereof) through differences in environment and household characteristics.

While I always like to start with what I liked about a paper, here, for the sake of the reader’s full understanding of the comparisons between the RCT findings and the qualitative study, I need to point out an unfortunate fact. While the qualitative work took place in the same general areas as the study sites, no qualitative work (at least not by the team that included Prof. Kabeer) was conducted in places where the RCT collected data and vice versa. This, despite the fact that the areas the qualitative team worked in (described as a few hundred miles away, which does not seem close to me in absolute terms) had the same procedures of identification of households, random assignment, etc. The short Section 3.1 gives some clues about why the RCT was not conceived as one mixed-methods evaluation rather than two separate studies with different methodologies in different sites, but the fact that a team of BRAC Development Institute and IDS (funded by MasterCard) conducted a study in separate locations than the RCT is simply unfortunate. Many times, throughout reading this otherwise nice and useful paper, I just wished to know what the larger sample quantitative data would have said about the same respondents. Alas, they were not interviewed by the other set of researchers…

Nonetheless, I learned a fair amount from sections 2-4, which provided useful context for the TUP operations in the Sindh (Pakistan) and West Bengal (India). The settings are quite different, including many factors that would reasonably affect project success, such as isolation vs. connectedness of the study villages; role of village elites, NGOs, men, and intra-household dynamics; ability and flexibility of the NGOs to adapt when the basic TUP template needs to be tweaked; social and cultural norms regarding women’s participation in the necessary activities, etc. In fact, by simply laying these out, the qualitative study does a service to the RCT and its readers by laying bare the most glaring shortcoming of a six-country study published in a journal like Science: the template simply does not allow for this kind of detail to be provided as background context – not only in each country, but site by site within countries. Some might say the reporting of the plain facts alone (averaged and analyzed for some basic heterogeneity) without the interpretation of data from small samples in the qualitative study (QS from hereon) is a strength and not a weakness, but it would be hard to argue that we, as the readers, the researchers, and perhaps most of all future adopters would not benefit from the qualitative work. I have more to say on this below, towards the end of the post…

From the qualitative study, we learn a lot. For example, there is evidence that the random assignment procedure may not have been followed by the NGO in the Sindh, who, instead, might have chosen the beneficiaries based on their relationship to the elites. Note that different NGOs were working at different sites even within countries, so this may not have necessarily happened in the RCT sample. Furthermore, the evidence does not actually come from the qualitative work but another independent process-evaluation type study. Nonetheless, it’s not encouraging that procedures were not followed when public lotteries were envisioned.

Much more interesting, however, are discussions of refusal to take up the intervention in West Bengal, how this might have been associated with religion (Muslim villages and households might have been suspicious about the aims of the project, while Muslim women might have had a hard time trying to take advantage of the interventions provided by the program); the suitability of livestock rearing in the arid site in the Sindh; how the NGO in West Bengal was able to change course based on early feedback on which households fared better with what type of entrepreneurial activity; the categorization of households as slow or fast climbers based on a subjective-wellbeing scale and asset accumulation; discussions of the importance of certain eligibility criteria, such as the existence able-bodied male adults in the household; chronic illnesses and availability of health care; the cooperation or lack thereof between husbands and wife; higher mobility and empowerment of certain groups of women at baseline (influential in taking advantage of what’s on offer), etc. I really enjoyed reading these sections, plus I am a sucker for quotes from study participants (even though I try to be aware of the dangers of building elaborate narratives from a quote or two).

While the issues raised by the QS might be cause for some worry regarding internal and external validity of the findings in the RCT, I found that the two studies generally agree with each other. The qualitative study is generally careful and fair in its discussion of how the RCT dealt with the issues of non-compliance, attrition analysis, and the like (nothing non-standard that we need to get into). The weakness of the RCT is, in my mind, not in bias in any obvious way, but in the lack of detail it is able to provide. For example, the issue of low take-up is dealt with by the adoption of the standard ITT estimation – households that did not get treated in the treatment group are still in the sample. Looking at India in the science paper, the effects are huge, which is doubly surprising: the standardized effects on assets and incomes are upwards of 0.5 SD (the internal rate of return is also highest in West Bengal at 23%). But, if we assume that take-up rates were equally low (about 50%) in RCT sites, the ToT effects would have to be double these already unusually high impacts, i.e. huge. Had I known about this issue when I was writing my blog, I would have certainly raised it as a question mark. Kabeer (2019) relates this issue to the quantile regression analysis in Banerjee et al. (2015) and suggests that it may partly explain why we see higher quintiles doing better. Other issues such as the possibility of spillover effects (both countries had individual, rather than cluster, randomization) are also fair, something I raised in my blog four years ago.

It’s important to note that we are judging studies that were planned almost 15 years ago by today’s norms and standards, which is not fair. I don’t think anyone would design such a transfer program as an individually-randomized RCT today, but it was not so obvious a decade ago. The study had a spillover design in some sites, but not others.

Another interesting takeaway from the QS is the roles of the NGOs. For example in West Bengal, the NGO responsible for project implementation in the RCT areas excluded the possibility of self-help groups (SHG) and microfinance clients (in line with the eligibility rules), as such households are also more likely to benefit from other anti-poverty programs. The NGO in the QS area, however, did explicitly make use of SHGs, which was one of its own strengths and might have contributed significantly to the proposed success of the project, by allowing women to bond together, learn from each other, hide their savings in that formal setup (“saving by the book”), and provided them loans with better terms than available otherwise. The NGO in the RCT might have also steered initially unsuccessful participants from livestock rearing towards vegetable growing and other activities and reallocated livestock to better-off, more experienced families. Such adaptation of a basic intervention template to local circumstances seems key, but also introduces heterogeneity of outcomes based on the quality of the chosen implementer. In contrast, the NGO in the Sindh does not come across in the best light, failing to follow instructions and making key errors in judgment (it’s not clear whether these were avoidable or only clearer in hindsight). You can already see why it is unfortunate that we don’t have all these data from a subset of the study villages…

Sure, the sample size in the QS is small (20 in each site, if I am correct) and self-admittedly so, but there is nonetheless a wealth of information here, which help provide context, formulate some hypothesis for further testing, for heterogeneity analysis, and yes, even to assess bias and external validity. So, it pains me more to say that Sections 1 (introduction) and 5 (conclusion) are just discordant with the rest of the paper. Why make this a paper that sounds like it is mainly a critique of RCTs in development economics? Maybe, the author thinks repetition is good and useful? If so, I disagree: the mentions of the paper on Twitter, including by the author herself emphasizing the shortcomings of RCTs, certainly did not make me want to read the paper.

These critiques, which seems to have caused the author to spend an inordinate amount of time whether the assignment was really randomized or not, don’t add value to the paper or the literature. And, they sometimes obfuscate or confuse issues by accident. Program take-up is a completely different issue of non-compliance than randomization procedures, yet they are discussed together. The former is not only a common issue, but also in no way limited to RCTs. The mention of methods employed ex poste to deal with econometric issues feels dated. Sentences like “…RCTs frequently do not collect information on [relevant] variables because they do not consider them relevant to their experiment or even know what they might be” are both unnecessary and unfair to a lot of researchers in the field. What is fair is the statement that the RCT findings could have been interpreted much better with more knowledge of the local context and the trial population (compared with the target population IF they differ). It’s also fun to read the paragraph in the final section about mundane reasons (arising from the QS study) as to why the program effects might have been small in absolute terms in contrast with the higher-level informed speculations of Banerjee et al. (2015) on poverty traps. There is a dig there in there somewhere...

Two things to round up the discussion. First, I was surprised that there is no discussion (or even a passing mention) of the Bandiera et al. (2017) paper in the Quarterly Journal of Economics, which spun together a much better narrative about the impacts of TUP at both the household level and the changes in these village economies, along with the pathways, such as spillovers, general equilibrium changes, occupational change, etc. I do not remember mention of qualitative data collection reading that study, but I would not be surprised if there was one. If not, the authors show that it is possible to provide much needed context and explanations within a well-designed RCT, who are given room to write all their findings, including the background labor market context, pathways, longer-term findings, etc. Of course, they benefit from working with one, and the original, NGO (BRAC) in Bangladesh, but I have a suspicion that Bandiera et al. would not have provided as fertile ground for a critical comparison of QS vs. RCT.

This brings me to the issue of how to design studies, including plans for endline reporting of all the findings. It is not completely fair to criticize Banerjee et al. (2015) for writing a paper that fit the required template of a journal like Science. Many medical journals, but not economics ones, do the same. In the biomedical/public health field, the Lancet/Science articles are accompanied by a series of secondary publications in other journals. Perhaps, one can take issue with the authors not putting out follow-up papers that provide more in-depth context for the interpretation of findings, but maybe they are using their time better by designing follow-up studies – I don’t know. But, in economics, we have neither the culture nor the incentives to write those secondary papers in lower-tier journals. So, what could be done?

One idea might be to not only pre-register the study and get pre-trial acceptance (a la what the Journal of Development Economics is doing with their new Registered Reports) but be even more ambitious and consider a special issue. Remember the American Economic Journal: Applied Economics special issue that published six RCTs of microcredit? Like that, but even more ambitious... First, all articles rather than just the summary article would be pre-accepted. Second, the main article would be a formal meta-analysis of all country studies, but each country study would be written up separately in the same issue by a different team providing needed context. These papers would not only make use of mixed-methods analysis, but there might also be a final (or first or second) paper in the same special issue that is written by the qualitative methods lead – perhaps providing a different perspective. If you are working on a topic of outsized importance in development economics, have secured funding for a multi-country study, then there is no reason why (a) the study cannot have the best design and data collection efforts possible, and (b) be good enough for pre-registration and acceptance prior to the trial. Crazy? Maybe, but it might at least make some people think about the feasible alternatives.

For me personally, I already have mixed methods RCTs in the field, but will be paying even more attention to the data coming from the qualitative side. Researchers with different traditions of arriving at causal effects might have some discomfort and conflict trying to work closely together, but I do believe that the projects, on average, should benefit from such efforts.


Berk Özler

Lead Economist, Development Research Group, World Bank

Join the Conversation

Nathanael Goldberg
May 07, 2019

Thank you for your usual thoughtful treatment of this paper. We think it’s important to clarify a few points, which might not be evident from Kabeer's paper. First, in West Bengal and Sindh, the RCTs and the qualitative evaluations were not just in different locations – they were actually done with entirely different partners with their own selection processes, program design, staff, etc. The RCT in West Bengal was with Bandhan, while the qualitative evaluation was with Trickle Up and their local partner HDC. In Sindh four organizations participated in the RCT, with selection into treatment and control done by public lottery, while the qualitative work was done with a fifth organization, OCT. Thus the comments made by Kabeer about participant (mis)targeting have no bearing on the RCT. We also found in our own calculations that a relatively low share of households in the Sindh RCT were below $1.25, but are not aware of any concern with the random assignment. Regarding the concerns with take-up in West Bengal, this is an important issue and we addressed it in the Science paper, in the section on compliance with treatment assignment.
We agree that mixed methods have much to offer and we take issue with Kabeer’s characterization that we were resistant to qualitative work. In fact, IPA led the qualitative research in several of the sites (Honduras, Peru, and Ghana). In those sites we were able to better align the methods by choosing samples from the same communities, as you suggest.
Readers interested in the qualitative findings from the graduation pilots can find the studies archived on the FinDev gateway https://www.findevgateway.org/library?f%5B0%5D=search_api_combined_1%3A…

Berk Ozler
May 07, 2019

Thanks - this is helpful. To be fair to Prof. Kabeer, if I recall correctly, her paper does make it clear that the implementing partner NGOs are different in the RCT and the QS. My blog post may not have done enough justice to make this fine point.

May 08, 2019

Hi Berk. Very pleased that you did this blog on my paper. Always interesting to find out how others ‘read’ what you write and, furthermore, blogs I am told increase one’s readership! So thank you. I am largely in agreement with what you say but I thought I would pick up on a few points.
First, I am very glad you picked up on my central argument that both quantitative and qualitative research is strengthened by close and integrated collaboration viz. not carried out in tandem with each other or, in this case, as separate studies in close geographical proximity to each other. (Our qualitative evaluation and the RCT in West Bengal were around 150 to 200 miles apart so pretty close, in my view.) I am also glad that you picked up on the point that it was not the decision of the qualitative research team to take this parallel route. I also think that your suggestions about how to better design and report on evaluations at the end of the paper are excellent. I hope policy funders are listening since they will get better value for their funds if they follow your advice when they commission new evaluation research.
Second, happy you appreciated my 'dig' about why go for high level theoretical explanations when more mundane empirical deficiencies would do? However, I didn't appreciate your ‘dig’ that the reason we didn't discuss the study by Bandiera et al (2017) on the Bangladesh RCT of the original program was probably that it would not provide as fertile grounds for critical comparisons of qualitative evaluations versus RCTs. No, that was not the reason. My paper was set up as a comparison of two RCTs organized by the people at IPA and two qualitative evaluations carried out by BRAC Institute of Governance and Development, Dhaka and Institute of Development Studies, Sussex. I was involved in both of the latter so had a close, first hand understanding of how the pilots worked in these contexts and why some of our findings did differ from those of the RCTs. If I had been involved in qualitative evaluation of the programme in Bangladesh, I would certainly have included it. As it happens, I have expressed my misgivings elsewhere ( https://www.gage.odi.org/publication/gender-livelihood-capabilities/) about how an earlier evaluation of the same programme by Bandiera et al (2013) interpreted their findings based on speculation rather than evidence and contradicted by a later IFPRI study that did collect more evidence.
Third, I can you see why you consider questions of programme take-up and non-compliance as separate issues which they might be from a technical point of view. From a social science point of view, however, they are interesting as examples of the way that human agency messes up experiments in the field in a way that it does not generally do in the lab, one of the major critiques of RCTs.
And finally, I am sorry that the possibility that my paper might be mainly a critique of RCTs in development economics put you off the idea of reading it. But I don’t think all RCT practitioners are as reasonable as you so we must certainly continue our critique of the claims that many RCT practitioners continue to make about the rigour of their approach. Studies continue to come out that the real world tends to subvert these claims but the RCT community has been slow to change. A far more thorough critique of these claims can be found in https://www.iree.eu/publications/publications-in-iree/estimating-microc….
Best wishes, Naila

Berk Ozler
May 08, 2019

Hi Naila,
Thanks for the comments and the clarifications. On the Bandiera et al. work, I don't think, and I regret that it came across that way, you did not include it as a comparative study because it would not provide fertile ground: no one can be expected to write up a comparison that they did not study.
I was simply saying that a citation of, and a passing reference, to that paper would probably be appropriate for inclusion in your paper, especially because it was published in an econ journal that provided more room for some of the depth in context and nuance that you noticed to be lacking in the shorter (and six-country!) Science article.
Thanks again,

May 08, 2019

Prof. Kabeer, thank you for your response, and dr. Ozler, thank you for this review. As an impact evaluation enthusiast and intl. dev. student with a broader social-science background, I sometimes get quite frustrated by the (sub-surface) fights and frictions between the more quantitative and qualitative disciplines/researchers. It is great to see two people with different views (admittedly from the same discipline) actually engage in a conversation and showing attention for/interest in each other’s points. I hope this will continue and gradually lead to a true embrace of more mixed-methods and cooperation in this valuable field.

Karishma Huda
May 21, 2019

Thank you for your thoughtful analysis, Berk, and appreciate you acknowledging the missed opportunity that came from not having a mixed methods approach to the graduation program research. I couldn’t agree more.

As a former Research Manager with Brac Development Institute who oversaw the qualitative research in India, Pakistan, and several other graduation program sites, and a co-author on this original research along with Naila, I feel the need to set the record straight on a few issues.

Firstly, it is unfortunate that there was not greater insistence from the onset for collaboration between the QS and RCT teams. My team and Naila tried very hard to co-locate our research in RCT areas but were not allowed to do so for the fear of producing contrary findings. We embraced this possibility and thought that exploring the reasons for them would enrich the research process and results, but IPA did not see it that way. This is precisely why we were pushed into program areas (Trickle Up in West Bengal and OCT in Sindh) where the RCTs weren’t operating. I therefore take issue with Nate’s comment above, as this decision 15 years ago hurt both evaluations and should have been avoided.

On various occasions the RCT evaluators and myself/Naila presented our findings at graduation program conferences. Most of the time the audiences were excited by the QS findings, but equally frustrated that the onus was on them to connect the dots to develop a well-rounded picture of what was happening in these two sites. Back then we had to work hard to defend our research because it was not grounded in numbers. The polarized ‘qual/quant’ discussions from the time inspired me to write this blog after one of the CGAP-Ford Graduation meetings in Paris: https://www.cgap.org/blog/understanding-what-works-why-qualitative-rese…

I fully agree that this occurrence 15 years ago would likely not happen now. As a Social Protection Specialist who has commissioned or been a part of many evaluations of social protection programs, I’ve (fortunately) never again found myself in such a situation. On the contrary, I’ve found most qual and quan researchers not just open to complementary methods, but debate and drill down until they get to the heart of the truth. This is what we need to see much more of – embracing contradiction in research rather than being afraid of it.

Zanariah Mohd Nor
June 15, 2020

Thank you for clarifying the grey area why you and the team didn't proceed with the RCT areas. This is an insightful fact about how stakeholders/gatekeepers affect decisions on where and what to be validated.

Zanariah Mohd Nor
June 15, 2020

Hello Berk, thank you for your critique of the two studies and the related, it enlightens my understanding.

You've mentioned, these kinds of studies are important for funding agencies. Digesting your insights, could you further enlightened me more on the following questions:

To what extent all these research findings (graduation programs) are used to improvise the graduation programs? Will the positions (or representations--internal or external to the organizations) of the evaluators affect the utilization of the research findings on the programs? Will the positions (or representations) of the evaluators also affect the research findings?

In addition, Berk, based on your professional expertise and experiences, which type of evaluation is transformational development to graduation program/model (or any similar program like it) to eradicate poverty?