A core concern for any impact evaluation is the degree to which its findings can be generalized to other settings and contexts, i.e. its “external validity”. But of course external validity concerns are not unique to economic policy evaluation; in fact they are present (implicitly or explicitly) in any empirical research with prescriptive implications.
For the World AIDS Day, there is a sign at the World Bank that states that taking ARVs reduces rate of HIV transmission by 96%. If this was last year, a sign somewhere may well have read “A cheap microbicidal gel that women can use up to 12 hours before sexual intercourse reduces HIV infection risk by more than half – when used consistently.” Well, sadly, it turns out, so much for that.
When done well, randomized experiments at least provide internal validity – they tell us the average impact of a particular intervention in a particular location with a particular sample at a particular point in time. Of course we would then like to use these results to predict how the same intervention would work in other locations or with other groups or in other time periods.
- external validity
At a recent seminar someone joked that the effect size in any education intervention is always 0.1 standard deviations, regardless of what the intervention actually is. So a new study published last week in Science which has a 2.5 standard deviation effect certainly deserves attention. And then there is the small matter of one of the authors (Carl Wieman) being a Nobel Laureate in Physics and a Science advisor to President Obama.
Following on David’s rant on external validity yesterday, which turned out to be quite popular, I decided to keep the thread going. Despite the fact that the debate is painted in ‘either/or’ terms, my feeling is that there are things that careful researchers/evaluators can do to improve the external validity of their studies.
Concerns about external validity are a common critique of micro work in development, especially experimental work. While not denying that it is useful to learn what works in a variety of different settings, there seems to be two forms of double-standard (or a double double-standard) going on: first, economic journals and economists in general seem to apply it to work on developing countries more than they do to other forms of research; and second, this concern seems to be expressed about experiments more than other micro work in development.