Blattman, Fiala, and Martinez (2018), which examines the nine-year effects of a group-based cash grant program for unemployed youth to start individual enterprises in skilled trades in Northern Uganda, was released today. Those of you well versed in the topic will remember Blattman et al. (2014), which summarized the impacts from the four-year follow-up. That paper found large earnings gains and capital stock increases among those young, unemployed individuals, who formed groups, proposed to form enterprises in skilled trades, and were selected to receive the approximately $400/per person lump-sum grants (in 2008 USD using market exchange rates) on offer from the Northern Uganda Social Action Funds (NUSAF). I figured that a summary of the paper that goes into some minutiae might be helpful for those of you who will not read it carefully – despite your best intentions. I had an early look at the paper because the authors kindly sent it to me for comments.
Not too high and not too low…
The bottom line of, but perhaps not the most important thing about, this paper is that those large effects – on employment, earnings, and consumption – have disappeared after nine years. Anything that might have remained is either too small (skilled work) or too uncertain (durable assets at the household) to write home about. Nor are there any effects on secondary outcomes for the beneficiaries or their children (health, education, fertility, etc.). The program accelerated, on average, the recipients’ path to a higher, but still modest, equilibrium, but the control group converged to treatment over time. The authors sketch out a simple model that rules out poverty traps for the counterfactual group (those who had successful applications but were randomized out): they are on a steep gradient of increased hours worked, total earnings, and capital stocks over time, meaning that they were able to save and invest. In perhaps what is somewhat a fresh look at the two- and four-year results, the authors also find that this convergence may have been underway sometime between Years 2 and 4.
Before I delve into what I think the main messages (and puzzles) are, and discuss the study’s few limitations, I should say the following: I like this paper. Not many programs like this have nine-year evaluations. The authors tell an interesting and coherent story. This draft is written with care, nuance, and caveats – all of which make the claims much more qualified. I suspect that, wherever it is published, this will be an influential paper on the limits of cash grants on poverty reduction – joining a pipeline of forthcoming evaluations of a variety of cash transfer programs from different countries and contexts.
Why did the treated enterprises stop growing?
I mentioned the authors’ discussion of how and why the control group might have converged to the treated. However, what I missed is perhaps an equally important question: why did the small enterprises of the beneficiaries stop growing?
Here is some useful descriptive analysis that the authors provide in Table 5. You can think of the treatment group as being comprised of roughly three, more or less equally-sized, subgroups (even though the subgroups are endogenous, this way of thinking yields an interesting and useful pattern):
- A third never received training in a skilled trade: these individuals never ended up having business assets worth more than the grant by NUSAF
- Another third (29% to be exact) were funded and trained, grew their enterprises decently (or, perhaps, even impressively) by Year 2, but were no longer practicing that skilled trade by Year 4 and were in the process of heavily divesting their assets.
- Finally, the remaining beneficiaries (38%) were funded, trained, and practicing their new trade by Year 4, but were no longer growing.
Table 7 provides some evidence towards an answer to my question above, but it also raises more questions. It looks like beneficiaries may not be sticking with these enterprises because the returns aren’t that high (or higher than the counterfactual employment options). There are no effects on earnings per hour. But, this is surprising because the controls caught up in employment hours by mainly doing casual labor and petty trade. There are two, equally unattractive, possible explanations: either returns to skilled labor are not higher than those to petty trading and casual/unskilled labor OR they are a bit higher but the program (despite being successfully implemented and taken up at high rates) produced very small numbers (or hours) of skilled trade. A careful reading of the same table shows that the durable effects on skilled trades highlighted by the authors are very small in absolute terms (from 3 to 6 hours per week in C vs. T; or from 3% to 6% of people whose primary occupation is a skilled trade. There are no effects on hours spent on skilled wage labor).
Why is being a tailor or a welder or a hairdresser not better than working in a field or running a small kiosk in terms of earnings? If it is, why don’t more beneficiaries try to make those their main occupation rather than at best a side job? These findings are puzzling – at least to me…
What I take away, however, is that we are seemingly in a setting where the economy is taking off. The program targeted a group of poor people, but by no means the poorest, the least able, or the least educated, who themselves were about to get on a path of increased employment, even if it was not necessarily going to be in a skilled trade. And, all of them self-selected into putting in successful submissions for proposals to start enterprises in skilled trades, in an environment where it is possible to purchase training for those skills and buy materials if one has the funds. Even in this setting, which I would call favorable for success, two thirds of the program beneficiaries were not practicing a skilled trade after four years, and the rest were happy with small enterprises that were no longer growing by Year 4 and making no more money than people in unskilled trades in Year 9. I find this not only puzzling, but also depressing…
In some ways, the findings show how difficult it is to move the needle on poverty reduction with grant programs like this. I have written about some other studies in the pipeline that have shown short-term effects that failed to translate into longer-term effects (see the penultimate paragraph here, for example). Even the programs that show sustained effects after 4-7 years, such as that of BRAC’s ultra-poor program in Bangladesh (that I wrote about here, among other places) have effects that are small in absolute terms. There is still a lot to be said for the conditional and unconditional cash transfer programs that are part and parcel of social protection strategies for many governments with respect to poverty reduction. New papers on the longer-term impacts of some of these are also coming out at a really fast pace: see, for example, here, here, and here for just a few).
What are some of the limitations of the study?
I think there are three:
- baseline balance and attrition
- better (or more consistent) measurement over time
Baseline balance and attrition: Impact estimates that do not control for baseline differences are quite a bit different than estimates that do, which indicate baseline imbalance mentioned in the paper. The paper does not go into the reason why baseline data were collected after randomization, which seems to have contributed to the loss of 13 groups, all mysteriously from the control group. [The timing of the randomization could have had a legitimate reason – such as the government needed to know treatment status before meeting the groups.] Whatever the reason, there is not much the authors can do now – other than highlight the issue up front and warn the reader.
Similarly, with attrition, the outlook is not rosy. The authors conducted what is now a common two-phased tracking strategy to interview the study participants. However, the numbers found in Phase 1, selected for Phase 2, and, particularly, found in Phase 2 are low enough that the estimates in Year 9 must be estimated with weighted least squares (WLS), making the analysis of this RCT quasi-experimental. When success in intensive tracking is as poor as it is at the nine-year follow-up (43% in Phase 2, among those randomly selected to be tracked intensively), we know that what Green and Gerber (2012) call “MIPO | X” (missingness independent of potential outcomes conditional on observables) is likely to be violated and the WLS estimation (or corrections using inverse probability weighting) is not sufficient (I wrote a post about this here). Simulations show that the whole two-phase approach to attrition and such corrections work best when numbers tracked and found are high, particularly the success rate in Phase 2, i.e. intensive tracking. With 43%, we get noisy estimates for those tracked, which get amplified with large sampling weights. As the authors point out Manski bounds are too extreme to be useful with this level of attrition and even the sensitivity analysis that assumes +/- 0.25 SD gaps in impacts among those lost to follow-up (Table 9) can only rule out negative and significant treatment effects.
Spillovers to ineligible entrepreneurs in villages/parishes: The question that this project never really tried to sink its teeth into is the obvious question of spillovers. From the detailed descriptions of the program, we learn that most groups chose a single skilled trade (submitting proposals to learn about tailoring, or hairdressing, or else and looking to take advantage of economies of scale to train together), meaning that it is possible that within a year or two of the intervention, a parish may have 15 new tailors or hairdressers or metal/wood workers, etc. It’s not hard to imagine that even in parishes with an average population of about 10,000, this could have crowded out existing skilled trades people. In fact, the third of the treatment group divesting heavily between Years 2 & 4 despite having been trained in a skilled trade earlier may also have suffered from the same competition – depending on the intensity of similar enterprises operating nearby and the number of people practicing the same trade in their original NUSAF group.
- In the authors’ defense, this paper devotes two good paragraphs to the topic and briefly discusses the possibilities of positive and negative spillovers. The authors also warn the reader that the cost-effectiveness findings can easily be nullified if there are some negative spillovers on pre-existing tradespeople. Reading these two good paragraphs, I am skeptical of positive earnings effects (there is a small uptick in part-time employment to non-family members, but no accompanying effect on earnings) or beneficial price effects, whereas the crowding out scenario seems more likely (and supported by evidence from Blattman et al. 2016, as cited by the authors themselves).
- Still, since the authors know more about these communities (and seem to have collected qualitative data, to complement their main analysis) than anyone else, I wonder if they could illuminate the reader a bit more about such spillovers: can they give us even anecdotal evidence about the program possibly having created too much of the same skill in one place? Could this explain the high migration rates, as well as the stunted growth of these new enterprises?
- To the extent that negative spillovers (not from treatment to control; but from eligible to ineligible within treatment areas) are a real possibility, the finding of elusive poverty reduction from lump-sum grants programs is even more disappointing…
- A higher success rate in the intensive tracking of study participants: this would have not have only lowered the probability of the violation of MIPO | X, but also tightened the bounds in the sensitivity analysis.
- Some objective measure of child health or development: Many recent studies have shown effects on the height (or nutritional status) of young children exposed to their parent’s positive income shocks (usually through a CCT or UCT). Given the possibility of durability of the effects of such anti-poverty programs for adults, possibly only through their potential effects on early childhood development, it would have been nice to have an objective measure of a predictor of long-term gains, such as the height-for-age z-score, anemia, upper arm circumference, and the like.
- Consistent measurement of capital stocks and consumption over time: For a study that is about poverty reduction through enterprise creation, it is a limitation to not have consistent measurements of a reliable per capita consumption measure and the stock of business assets at each round of data collection. It would have been nice to show those graphs for the whole study, run regressions using lagged baseline values, etc. Even the trajectory of business assets until year 4 was very instructive and I really missed the year 9 value in that table (Table 5).
I’d like to finish by quoting a paragraph from the Introduction section of the paper:
“Yet while these standard models give us many reasons to eventually expect temporary gains, it is striking how few policy analyses and papers on these programs dwell on convergence – including our own 4-year evaluation of YOP (Blattman et al 2014). We, like many others, implicitly framed the grants program as a potential solution to poverty and failed to weigh what briefer impacts would imply for theory and practice.”