Syndicate content

Evaluating a large cash transfer program? Don’t ignore Regression Discontinuity Design

Berk Ozler's picture

One of the more common requests I receive from colleagues in the World Bank’s operational units is support on evaluating the impact of a large cash transfer program, usually carried out by the national government. Despite the fact that our government counterparts are much more willing to consider a randomized promotion impact evaluation (IE) design these days, still this is often not possible. This could be, for example, because it has already been announced that the program is going to be implemented in certain areas starting on a certain date. When randomization is unavailable in such cases, one of the tools available to us is regression discontinuity design (RDD), which does not get considered as frequently as it should be in my experience.

About 10-15 years ago, impact evaluations were still being conducted, but one of the main differences (at least in the World Bank) was that we were not there designing an evaluation of a program well before it was about to start. What was more likely to happen was someone asking “Country X is conducting this interesting school reform program. I wonder what the impact is.” This would lead to, as the program had already started, quasi-experimental methods of constructing an ex-post counterfactual to estimate impacts. In that sense, along with matching techniques, RDD was an option available to evaluators IF the program lent itself to such a design.

But, with the increasing availability of randomized experiments, as well as availability of data prior to the start of the intervention (meaning nicer PSM diff-in-diff designs are also available in the toolbox), RDD is still seen only as an ex-post design strategy, meaning that you’d do it only as a last resort if you can’t do anything else. Not many people seem to be thinking about planning to evaluate a program using RDD ex-ante. But, it doesn’t have to be that way.

Many cash transfer programs around the world, including PROGRESA along with a multitude of programs that are being designed in sub-Saharan Africa, use targeting criteria that are continuous, such as (proxy) means testing. The use of a cutoff, i.e. a threshold score below which the households are not eligible to receive cash transfers, creates a discontinuity that forms the basis of the RDD design. If this is the case, and you and your counterparts are fairly certain that the threshold will be enforced, you should seriously consider RDD as an ex-ante design option.

There are two main worries with RDD as an IE design. First, it is a local treatment, meaning that the impact findings are valid for households within a band around the cutoff score, but may not be valid for those further away. This is a legitimate concern, to which I have two answers. First, many programs I know in Africa target as small as 10% of the population as poor or vulnerable using some index. If the targeting is this narrow, I don’t need to worry as much about whether the impact for the 9th percentile is different than those at the 5th percentile. As the share of the population targeted grows, this concern becomes more legitimate: the impact on the 39th percentile could be quite different than that for the 19th. This is where my second answer comes in. Many studies of cash transfer programs have shown that the impacts are larger for poorer people among the eligible population. This means that the effect you find around the cutoff will likely be a lower bound for the effect on those who are poorer. (A third suggestion could be to exploit the fact that many cash transfer programs actually have a local cutoff rather than a national one, meaning that the absolute value of the cutoff in the distribution of the index will vary. However, this variation is not exogenous but is correlated with poverty in each intervention area, so the interpretation of heterogeneity of impact by the absolute value of the cutoff score is confused with whether the program is simply working better or worse in poorer communities.)

It is important to pause here and talk about RDD as a “locally randomized experiment,” in the language of the excellent review piece by Lee and Lemieux (ungated here)last year in the JEL. Notice how they explain why RDD may have been underutilized until recently:

"RD Designs as Locally Randomized Experiments: Economists are hesitant to apply methods that have not been rigorously formalized within an econometric framework, and where crucial identifying assumptions have not been clearly specified. This is perhaps one of the reasons why RD designs were underutilized by economists for so long, since it is only relatively recently that the underlying assumptions needed for the RD were formalized. In the recent literature, RD designs were initially viewed as a special case of matching (Heckman, Lalonde, and Smith 1999), or alternatively as a special case of IV (Angrist and Krueger 1999), and these perspectives may have provided empirical researchers a familiar econometric framework within which identifying assumptions could be more carefully discussed.”

“Today, RD is increasingly recognized in applied research as a distinct design that is a close relative to a randomized experiment. Formally shown in Lee (2008), even when individuals have some control over the assignment variable, as long as this control is imprecise— that is, the ex ante density of the assignment variable is continuous—the consequence will be local randomization of the treatment. So in a number of nonexperimental contexts where resources are allocated based on a sharp cutoff rule, there may indeed be a hidden randomized experiment to utilize. And furthermore, as in a randomized experiment, this implies that all observable baseline covariates will locally have the same distribution on either side of the discontinuity threshold—an empirically testable proposition."

The last sentence also important: using baseline data prior to the start of the intervention, you can test whether RDD is feasible as an IE strategy or not. And you can use RDD even if there is some crossover in beneficiary status around the cutoff, as long as this is not severe (i.e. local officials are not blatantly disregarding the targeting rules).

Nonetheless, while it is encouraging that RDD is now recognized as a distinct design you can employ that is a close relative of an RCT, the point of it being a “locally randomized experiment” remains. I know that many technocrats do worry about this and are reluctant to employ RDD because the impact findings may not apply as we move away from the threshold and they care much more about the average impact than the local one.

In such cases a propensity score matching diff-in-diff design would be nice if you can pull it off. The problem is that, these days, you can no longer get away with having just one data point (i.e. your one round baseline data collection prior to the intervention) and employing this methodology. This is because you need to convince your audience that the time trends in the outcome variable in the matched treatment and control groups were similar before the program started. Without it, you’re vulnerable to a different kind of criticism: you got away from the problem reporting a local impact, but now you may be reporting a biased average impact. Here is a picture that should give you full confidence in choosing PSM diff-in-diff over RDD.

 

The picture above is using monthly data going back a couple of years on an outcome variable of interest from matched municipalities prior to the start of an intervention. This is as good as you’re going to get in convincing your reviewers that the time trends in your matched treatment and control groups are almost identical and that any impact you find will not be due to different trends in these groups but can be attributed to the intervention. If you don’t have this, you should at least have one data point prior to baseline (i.e. two data points before the start of the intervention). In a typical evaluation setting for a government program, these data may not exist in the selected evaluation areas and the matched control ones.

Furthermore, this is potentially more expensive than RDD – you have to be in more places to collect data. With RDD, you may need more individuals within each cluster, but less clusters and more people in each, is cheaper than more clusters and a lower number of households in each.

This brings us to the second worry about RDD: spillovers. Angelucci and De Giorgi showed here (ungated here) that PROGRESA had significant benefits for non-beneficiary households. If cash transfer programs have significant spillover effects around the cutoff, the RDD estimates will be biased. Again, if we think that these spillovers are positive, as in PROGRESA, then the RDD estimate will be a lower bound. If there is exogenous heterogeneity in the intensity of the treatment (for example, through randomly assigned cutoff values by community), you might be able to examine the extent of spillovers exploiting the fact that more money is coming into some communities than others. Finally, the extent to which you expect spillovers will differ by program: large cash transfer programs are much more likely to generate spillover effects than using a discontinuity in secondary school entrance exams (see Owen Ozier’s JMP on Kenya here, Jacobus de Hoop’s dissertation chapter on Malawi here) or discontinuity in postpartum hospital stays due to the fact that a baby was born just before or just after midnight (see the clever Almond and Doyle 2011 paper here). Furthermore, even in the case of PROGRESA, Buddelmeyer and Skoufias (2004) find that RDD performed remarkably well compared with the estimates from the randomized experiment (caveat: I was unable to find a published version of this paper, so I am not 100% confident that the findings are robust in light of the significant spillovers Angelucci and De Giorgi found).

Ideally, the best thing to do when randomized promotion is not available as an IE strategy, is to collect data in carefully matched treatment and control areas, and for beneficiaries and non-beneficiaries in each of those areas (you can use sampling weights to get a representative sample, while you still get enough people around the cutoff). This way, you can compare a bunch of estimates to check robustness, as well as be able to address important spillover effects if your matching exercise was powerful and convincing. RDD adds little to the costs as long as you have a discontinuity you can use in the assignment rule, so it's good to keep it in mind when designing an IE.

Comments

Note that in the context of a cash transfer program the RD design evaluates the impact of the program at a particularly relevant margin. Besides having the program or not and the amount of the transfer, governments choose what share of the population to include. You may find this paper of interest: http://oosterbeek.economists.nl/abstract.php?citekey=oosterbeekEA07

Submitted by Javier Baez on
I am sympathetic to the fact that sometimes one has to evaluate the impact of a program well after it has started. This is particularly important when investigating long-term effects. Not long ago a colleague and I were asked to look at the long-term effects of a CCT program in Colombia but data for such exercise were not explicitly collected. Our budget and timeline were very tight too. However, we discovered that the program, like probably many other CCTs, has a comprehensive information system used for administrative and monitoring purposes. The system is basically a census of program beneficiaries from the onset of the program to present. We used it to create a panel of beneficiaries and combined it with --also existing-- data from a census of the poor that was carried out for the proxy-means test used to determine eligibility. We then use the resulting data to exploit the sharp discontinuity at the eligibility threshold to look at the effects of the program on high school completion and learning outcomes. My whole point is that there may be other opportunities to use RD when budget, time and lack of data purposely collected for impact evaluations (as in our case) are serious constraints. Many programs have rich M&E systems and are targeted in a similar way. These data exist and can be used to conduct rigorous impact evaluations at a low cost. Here is the link to the paper: http://intranet.worldbank.org/servlet/main?pagePK=64161651&theSitePK=469233&piPK=64161652&menuPK=64166272&entityID=000158349_20110614094457