External Validity Musings

|

This page in:

External validity

What if we could replicate your favorite three-starred experiment somewhere else and then… disaster, it doesn’t work! Whaaaat?! Why? Should we just give up? A common criticism of experiments in economics is that they need not be externally valid, i.e., the effects of an experiment in one place may be different from those effects somewhere else. Although certain researchers can muse about this from their comfortable armchairs, implementers have to face this question every programming cycle — specifically, can we use the results from a small number of experiments to decide what we should do next? At DIME we recently launched a new research partnership with the World Food Programme to do exactly this — run a series of similar experiments across multiple countries to provide systematic advice on how and where to adjust their programming to maximize impact. How can we assess external validity and use the results of these experiments to inform their work?

There are partial solutions to this problem. Many approaches have been developed for accounting for differences across contexts and across target individuals. However, a recent paper from Mark Rosenzweig and Chris Udry proposes a new reason why some interventions may work well in one place and time but not another — random shocks — and a solution. In particular, if we observed an intervention working well in one country, but not another, we might initially be tempted to conclude that the intervention works well in the first country, but not the second. Alternatively, the first country may have been hit by some sort of temporary shock, such as a severe drought or flood, that caused the intervention to be more effective. Rosenzweig and Udry then propose a set of approaches for both quantifying the role of these shocks and extrapolating to the impacts of the intervention in an average year, and the variability of the impacts across years.

Again, note that this is in contrast to the usual reasons we think about external validity breaking down in development, where interventions may have different impacts in different places and on different individuals. In this post, we take some liberties in summarizing Rosenzweig and Udry and talk about three different sources of violations of external validity: differences in effects across individuals, differences in effects across time (as Rosenzweig and Udry discuss), differences in implementation, and solutions for each.

Extrapolating across individuals

Heterogeneity across contexts In some cases, multiple interventions across contexts combined with careful analysis of mechanisms can allow us to understand in what cases interventions are more or less likely to work. For example, consider two papers estimating the impacts of expanded access to industrial jobs - one in Bangladesh finds women delay marriage and increase educational attainment, while one in Mexico finds men reduce educational attainment.

In each case, the impacts depend on the relative returns to education in each industry compared to alternative employment (or self employment) options. This suggests that something systematic can be estimated across contexts that can tell us what determines the impacts.

Heterogeneity across individuals Whether impacts vary across contexts or across individuals, metaänalytic approaches can help us to formally estimate this heterogeneity. For example, one paper looks at a series of microfinance interventions across multiple contexts.

They find that these interventions have systematically larger impacts on existing entrepreneurs, who have higher returns to capital than individuals who have yet to start businesses. We could then use this to calculate the likely impact in a new context, depending on the density of entrepreneurs with limited access to credit.

Unobservable heterogeneity In other cases, extrapolation involves differences that depend not on something observable, but on some sort of selection on unobservable characteristics of individuals or locations. For example, individuals vary in their returns to capital, dependi ng on their access to profitable investment opportunities, something that is difficult to directly observe.

One paper demonstrates that these high return individuals are likely to take up loans when offered, by comparing impacts of cash grants on individuals who select into loans to impacts of cash grants on individuals who do not.  By comparing impacts on these particularly eager individuals to less eager ones, we can understand how the impacts of the intervention vary as its intensity is increased and less and less eager individuals are progressively reached.

Extrapolating across time

Heterogeneity due to shocks The impacts of interventions may differ over time. In their paper, Rosenzweig and Udry demonstrate that stochastic shocks have large impacts on responses to interventions. They give a number of examples - they find drought shocks shrink the impacts of interventions to increase agricultural productivity, while increases in oil prices increase the wage impacts of school construction in Indonesia. In all these cases, standard approaches estimate the impacts of the intervention conditional on the realized shocks, which may be different from the impacts in an average year.

In these cases, they propose a straightforward solution - estimate how the impacts of the intervention vary in response to these shocks. Using information about how frequently shocks occur, one can then back out the impacts of the intervention in good, bad, and average years. One can also naturally account for this additional source of uncertainty in their estimate of the impacts of the program.

Heterogeneity due to policy Other changes over time are less stochastic. In other work, Rosenzweig and Udry show that wage floors created by NREGA alter the impacts of improved weather forecasting.

In these cases, estimates of heterogeneity in impacts of the intervention with respect to cross sectional variation in the policy can be used to infer how the effects of the intervention would vary as the policy changes over time.

Extrapolating across implementers

Heterogeneity across implementers Implementing partners are a crucial, and often ignored, component of an intervention. The same intervention run by two different implementing partners can have dramatically different effects. Two prominent examples of this come to mind. In Liberia, an evaluation studied outsourcing the management of public schools, which on average improved performance. However, these impacts varied meaningfully across providers, with the worst providers estimated to have no impact on performance. Recently, a very successful program discussed earlier to subsidize migration, which was actually became more effective as it was scaled up from individuals to communities, was unsuccessful in inducing migration when implemented at a much larger scale.

As these experiences suggest, this is often very hard to manage or anticipate! Working with well-established implementing partners can help ensure a successful intervention, but is no panacea — by their nature, experiments are often testing an intervention that has not been implemented at scale before! In many cases though, multiple rounds of pilots can help to identify weak points, but as the migration example suggests smaller scale pilots may be unable to identify exactly what will break down at larger scales.

Conclusion

So how does this all enter into our work with the World Food Programme? As the contexts in which they work are highly varied, we will be launching a series of impact evaluations that share a common design and common measurement. This will allow us to learn about where and for whom these interventions work, and why. As many of these interventions target vulnerable populations in fragile contexts, it is particularly important to keep track of shocks that might interact with the impacts of these interventions. Lastly, working with a single implementing partner across multiple contexts buys us a lot, as a lot of the hard work of systematically enforcing high quality best practices is already being done.

These are not the only questions we will need to think about in these impact evaluations. As the eventual goal is to scale up successes, we need to understand more “macro” level impacts of these interventions, on communities and economies. This requires estimating the indirect effects of these interventions, such as those operating through behavioral changes and markets. More on this soon!

Join the Conversation