Syndicate content

Do Conditions Moderate the Effects of Cash Transfer Programs? Preliminary Findings from a Systematic Review

Berk Ozler's picture

Last week, I talked about the difficulty of categorizing cash transfer programs neatly into bins of unconditional (UCT) and conditional (CCT) ones. Afterwards, one of the comments gently chastised me for being overly optimistic of thinking about these programs as being in a continuum of intensity of the conditions rather than in a multi-dimensional design space. I agreed, but also got to thinking. Should we give up so easily? Isn’t there any value in something that may be a simplification of the true world but is still an improvement over the current state?

Just to remind you what we’re talking about, in the current state of the world, we’re calling similar programs different things: like there are many shades of grey that we’re forced to call either black or white. For example, Honduras’ CCT program, PRAF-II, has enrollment conditions and not much else; Ecuador’s UCT program, BDH, was conceived as a CCT, advertised as such for a while, conducted social marketing campaigns about the value of investing in children’s education, but never monitored school participation or penalized for the lack of it. If we pool many programs like these, categorize them as either CCTs or UCTs, and, say, don’t find any differences in schooling impacts between the two groups, is it because there really is no difference or because our categories are too noisy.

For a Systematic Review we’re conducting for Campbell Collaboration (funded by AUSAid through 3ie) on the relative effectiveness of CCTs and UCTs on schooling outcomes (with Sarah Baird, Francisco Ferreira, Michael Woolcock), we have systematically searched all studies that can be categorized as CCTs or UCTs (the review should hopefully be out soon – Sarah is presenting preliminary findings at Campbell Collaboration’s Colloquium in Chicago later this week). In the study protocol that was published last year, we had proposed to conduct moderator analysis using a set of variables, such as transfer size, identity of the recipient, frequency of transfers, and the degree/extent of monitoring (a moderator is defined as a variable that affects the direction and strength of the relationship between an intervention and the outcome). Accordingly, we had even coded a binary variable (only among the so-called CCT programs) for whether conditions were monitored or not. However, after the exchange on last week’s post, and able to see things a bit more clearly, we put our thinking caps back on: could we categorize all programs, and not just the CCTs, in order of the intensity of schooling conditionalities imposed by the administrators? In other words, could we turn this multi-dimensional design space (of incentives, nudges, social marketing, monitoring, enforcing, etc.) into a linear space and rank cash transfer programs from having nothing to do with schooling at one end to those with explicit conditions that are monitored and enforced at the other?

To do this, two of us came up with categories (from 0-6, described below) and independently scored each of the 35 studiesin our meta-analysis of school enrollment. Afterwards, we got together to compare notes, debated the disagreements, and came to a consensus on the ranking of each program. [Please see the endnote at the bottom of this post on the independence of the coding of this ‘moderator’ from the effect size for each program]. Our categories, in increasing intensity of conditions, were as follows (# of studies in parentheses):

  1. UCT programs unrelated to children or education – such as Old Age Pension Programs (2)
  2. UCT programs targeted at children with an aim of improving schooling outcomes – such as Kenya’s CT-OVC or South Africa’s Child Support Grant (2)
  3. UCTs that are conducted within a rubric of education – such as Malawi’s SIHR UCT arm or Burkina Faso’s Nahouri Cash Transfers Pilot Project UCT arm (3)
  4. Explicit conditions on paper and/or encouragement of children’s schooling, but no monitoring or enforcement – such as Ecuador’s BDH or Malawi’s SCTS (8)
  5. Explicit conditions, (imperfectly) monitored, with minimal enforcement – such as Brazil’s Bolsa Familia or Mexico’s PROGRESA (8)
  6. Explicit conditions with monitoring and enforcement of enrollment condition – such as Honduras’ PRAF-II or Cambodia’s CESSP Scholarship Program (6)
  7. Explicit conditions with monitoring and enforcement of attendance condition – such as Malawi’s SIHR CCT arm or China’s Pilot CCT program (10)

So, what did we find?

Using our binary program categorization, forcing the difficult and fuzzy programs to be defined as a CCT or a UCT as best as we can, we find that both types of programs improve the odds of children being in school: 23% for the UCT programs and 41% for CCT programs. The difference between the two has a p-value of 0.183, meaning that the difference, which is sizeable, is statistically insignificant (please note that effect sizes are in odds ratios (OR), meaning that if 75% of the children are in school in the control group and the treatment improved that by 5 percentage points, the effect size is equal to 80/20 divided by 75/25, or 4/3=1.33.)

However, using the categorization above, the picture is much clearer. In a random-effects meta-regression model with effect size as the dependent variable and this moderator describing the ‘intensity of conditionalities’ as the only independent variable, the coefficient estimate is large and statistically significant at the 1% level. The linearization (see Figure 1 below) suggests that each unit increase in the intensity of the conditionality is associated with an increase of 7% in the odds of being enrolled in school:

Another way to see this visually (which also allows you to see each study’s effect size and how they’re broadly categorized) is as follows: group programs into ‘no schooling conditions’ (categories 0-2); ‘some conditions with no monitoring or enforcement’ (categories 3-4); and ‘explicit conditions, monitored and enforced’ (categories 5-6). Here is the forest plot of effect sizes for these groups:

The effect sizes for these three groups are, respectively, 1.18, 1,25, and 1,60. So, while all programs cause a statistically significant improvement in enrollment, the effect in the final group is both meaningfully and significantly larger than the first two groups.

Could it be that this variable is picking up some unobserved characteristics shared by all the ‘would-be successful’ studies, meaning that our finding is simply a correlation and has no causal implications? This is certainly possible, but we could not find the smoking gun that would make this result disappear among the other moderators that we had coded: when we run the same regression, but this time including mean enrollment rate in the control group, transfer size as a share of household income (or annual cost of program per person), whether the transfer is given to the mother/female in the households, the frequency of transfers, whether the intervention is a pilot or a scaled-up national program, etc., the size and significance of the coefficient estimate for the ‘intensity of conditionalities’ remains as large and significant as before. Furthermore, none of these other moderators is associated with the effect size on enrollment.

The preliminary findings of this systematic review are consistent with previous evidence from experiments in Malawi and Burkina Faso, which were themselves more or less consistent with previous evidence using quasi-experimental data: while all cash transfer programs cause some improvement in school enrollment, conditions seem to cause further increases in program impacts.

I’ll provide a heads up in this space (or in our popular Friday links) when the systematic review is published. We will also aim to make the data we compiled, as well as our code, publicly available as soon as possible thereafter, so that other researchers can replicate and improve on our work. In the meantime, as always, comments are welcome.

[Endnote: When we started coding this new moderator variable last week, we were in the process of revising our meta-analysis in light of comments and suggestions from Campbell Collaboration’s Methods Editors. This involved recalculating effect sizes by synthesizing impact estimates from multiple papers into a single effect size for each intervention – e.g. Mexico’s PROGRESA has 15 papers, many of which had themselves reported multiple effect sizes.As such, the coding this moderator variable should not be influenced by our knowledge of effect sizes. Still, it would have been obviously better if we didn’t think of redefining this variable last week after my blog, but when we were writing our protocol last year. However, without actually starting to conduct this meta-analysis and reading the descriptions of all these programs, we would not have come up with this idea in the first place: classic catch-22. In any case, the data that we compiled for this systematic review, including our coding, will be publicly available as soon as the review is published, so anyone can redo the analysis for himself or herself using alternative categorizations – should they be dissatisfied with our coding.]


Berk - interesting and thought-provoking work. You are of course right to make the point about ex post identification of moderators being usually suspicious if not down-right dodgy, so your disclosure is important. In your case, the logic behind the classification of the variable and the additional meta-regression results (together with the circumstances around the calculation of the synthetic effect sizes) do lend credence to your findings.

There are also a few broader points from your review for the research synthesis field about 1) attention to appropriate classification of interventions (programme design), which you make, and also 2) programme implementation, in this case through enforcement of the condition, matters. I would also add that 3) 'lumping' interventions (CCTs and UCTs) together in one review case has enabled some very useful comparative effectiveness analysis to be undertaken. Will look forward to reading the final product including those all-important findings on test scores.

Submitted by Berk Ozler on

Hi Hugh,
My view on secondary analysis, including ex-post identification of a new independent variable or redefinition/recategorization of an existing variable, is OK – in fact sometimes necessary – as long as it is clearly specified as such and done as transparently as possible.

Sometimes new information, not anything regarding the effect sizes and their p-values but rather substantive new insights, can lead to such secondary analysis, the results of which may be worse left out than presented. For the funding agency in this case, while I don’t know the specifics, it is clear that they have been waiting to find out the about the results of this review – presumably with the intent to apply some lessons from it (hopefully carefully and judiciously) to cash transfer programs that they may be supporting in Asia and the Pacific. Presenting pooled effect sizes using a binary categorization of such programs that we know now to be noisy and at best a poor representation of reality could be a waste of their money and potentially lead to future programs with inferior designs. On the other hand, uncovering a strong moderator of the effects of cash transfer programs, which the designer can realistically manipulate, may turn out to be a bargain for the same funder. Again, however, the caveat about full disclosure and the need for a theoretical or empirical grounding for the ex-post analysis, applies.

This has implications for the final version of the systematic review that becomes published – at least in my humble opinion. First, in the section titled ‘Differences between the protocol and the review,’ there would be a full disclosure of the redefinition of the moderator variable concerning the ‘degree of monitoring.’ Second, in the section reporting the findings, EITHER the results based on the original plan could be presented first, followed by a clearly demarcated section that presents the secondary analysis, OR when a result is presented based on analysis that deviates from the original protocol, it can be clearly marked as such. We’d be happy to do all of these and will likely look for guidance from the CC’s Methods Editors.

One more point: in this particular case, we have tried only one other variable instead of the binary CCT/UCT variable. The coefficient estimate for the moderator variable has a p-value of 0.01. I am not familiar with multiple inference corrections that apply to random effects models in meta-analysis, but it seems to me that the coefficient estimate would still be statistically significant at the 95% level if we adopted a correction for the fact that we examined two moderators rather than one.

On test scores, I would not hold my breath if I were you. In the end, only a small number of studies reporting impacts on test scores were eligible for our review and the preliminary finding is that the effect sizes are very small and mostly insignificant. The pooled effect for all cash transfer programs is 0.06 standard deviation improvement for children in households offered cash transfers of some type (95% CI 0.01-0.12).

Submitted by Flavio Cireno on

Dear Berk,

I Don't know where you get this information that Brazil has a minimal enforcement in the education conditionality, but it is very far from the truth.

In the bolsa familia program we have more than 90% of perfect enrollment information in the beginning and more than 97% in the end of the school year. Besides that, every two moths we have attendance information for each student. Now, we have very good information for more than 89% of our students, i.e, almost 16 milion of students.


Flavio Cireno
Bolsa Família Program - Brazil.

Submitted by Berk on

Dear Flavio,

Things seem to have improved over time -- from the start of Bolsa Escola, then becoming Bolsa Familia, then within Bolsa Familia. The publications we have access to, (including the Fiszbein and Schady, 2009, book on CCTs summarizing conditions and monitoring for each program), suggest that in the early days monitoring was less than perfect, and almost no one was penalized and that this changed over time. You seem to be referring to the current situation, which may be quite different than what is relevant for the periods evaluated in the publications eligible for our systematic review.

Add new comment