Why similarity is the wrong concept for External Validity


This page in:

I’ve been reading Evidence-based policy: a practical guide to doing it better by Nancy Cartwright and Jeremy Hardle. The book is about how one should go about using existing evidence to move from “it works there” to “it will work here”. I was struck by their critique of external validity as it is typically discussed.
They note that when people mention external validity , they use it to mean that the ‘same treatment’ has the ‘same result’ in a specific target setting as it did in the study for which there is evidence; with the orthodox advice that this can be expected if the target population is ‘sufficiently similar’ to the study population. They have four critiques of this:
  1. The notion of the same treatment is too vague – they give an example of the Tamil Nadu Integrated Nutrition Project which integrated feeding, health measures, and education of pregnant mothers about how to better nourish their children and was found to be responsible for a significant drop in malnutrition. But the program was then implemented in Bangladesh, with little success. The problem being in Bangladesh that mothers-in-law, not mothers, were in charge of handing out food – so the ‘same treatment’ shouldn’t be ‘educate the mother’, but rather ‘educate the person in charge of food decisions’.
  2. “It is an absurdly tough test to require the same treatment effect as in the study population. If that is external validity, there is no real chance a study will have it”. The average treatment effect is the average of b(i), the treatment effects for each individual-context combination observed in the original study. They make clear it is extremely unlikely that every element of the context is going to be the same, and the distribution of treatment effects in the two populations would be the same. They note instead we may want to understand the same effect as, for example, being the same in terms of making a positive contribution in both places.
  3. Similarity is too demanding and the wrong idea: They give the example of the conclusions of a paper on the Moving to Opportunity experiment, which give a kitchen sink full of characteristics of the subset being studied – and note that if the results are only relevant to this very specific set of characteristics, maybe it is not worth doing the study, while it is not clear what reason they have for noting some characteristics and not others.
  4. Similarity is wasteful: the treatment effect is an average of some people that have more positive effects and others than have less (or even negative). You shouldn’t aim for the same mix of effects in another situation – instead you would want to give the treatment more to those who benefit more from it.
Instead of looking for similarities between your population and the study population, they say you should be looking for what matters to getting the prediction you have in mind correct (e.g. it will make a positive contribution). “What is wrong with the ideas of external validity and similarity is that they invite you to stop thinking”. You should have a theory of why something should work, and the supporting factors that will help this occur – and then use this to determine whether you think a policy should work, not just blindly say things look similar or not similar enough.

The rest of the book is then about how to make an Effectiveness Argument which has the following three steps to be able to conclude the policy will work (in terms of having a positive effect) in a new location:
  1. The policy worked somewhere – it played a positive causal role there, and the support factors necessary for it to play this positive role were present for at least some individuals there. A typical RCT or other impact evaluation is useful in establishing this step.
  2. The policy can play the same causal role in the new location as it did in the old.
  3. The support factors necessary for the policy to play a positive causal role in the new location are in place for at least some individuals
For example, they consider the case of smaller class size, where a randomized experiment in Tennessee found smaller classes achieved better reading scores, but when the same policy was tried in California it didn’t work. They note that in Tennessee the policy was done only in schools which had enough available space for extra classes, and no shortage of qualified teachers to teach the new classes created – these support factors weren’t in place in California, so the third step of the argument above doesn’t hold.
I found the rest of the book a bit hit-or-miss – lots of use of metaphors (e.g.“causal cakes” and “argument pyramids”) for causal chains which weren’t always easy to read. I did like the suggestion to do a pre-mortem in order to figure out what factors are necessary for your policy to work: i.e. to suppose that the policy you are about to implement turns out to fail, and to think through why they could be. There is no discussion of things like mechanism experiments, or treatment heterogeneity, or any other such tools to help in establishing what these steps in the causal chain and supporting factors might be.


David McKenzie

Lead Economist, Development Research Group, World Bank

July 18, 2013

There is a silver lining. The book would make a great stocking stuffer for Rogoff and Reinhart. and the entire Austerity mad troika . This time it's different: external validity pitfalls for Macro Policy advisors.

July 18, 2013

3 points...
Shockingly, I was probably too flippant on twitter when I said you were "too easy on Cartwright & Hardie's book here," David. But I'll try to justify that. Though first I should say that I was excited about this book, bought it and read it immediately, and really enjoyed the early chapters in particular. And the book comes with strong endorsement from Dani Rodrik and Angus Deaton (!), so why should anyone care what I think anyway? Nevertheless, here's where it left me unsatisfied:
1. Data needed. This is a book arguing that hasty empirical generalizations lead us astray, particularly generalizations from RCTs in formulating social policy. I happen to agree, up to a point. But the book offers fairly little empirical evidence that this claim is true, and -- given the subtitle is "A Practical Guide to Doing It Better" -- almost zero guidance on where we should draw the line in practice. There are, to be fair, several real world examples and illustrations scattered throughout, but these feel more like cherry-picked cases to illustrate a negative point, rather than a serious evidence about how evidence-based policy is actually done or could be done better.
Instead, the book offers a long, sophisticated, occasionally insightful elaboration on the theme of "just because it worked here (or today) doesn't mean it'll work there (or tomorrow) -- and what is "it" anyway?" Reading C & H feels sort of like arguing billiards tactics with David Hume.
2. External validity is overrated. Yes, the Moving to Opportunity evaluation cost millions. But its potential implications for the population under study (residents of low-income housing in Baltimore, Boston, Chicago, Los Angeles, and New York City) were also huge -- without considering any extrapolation to other contexts. I don't think we should concede the value of good evaluation hinges on generalizability. I expected a book on "Evidence-Based Policy" to give more weight to the use of relevant evidence, and spend less time decrying the use of irrelevant evidence.
3. Lastly, just to stir up trouble, I guess I could also argue you were too easy on the book because, well, it's pretty antithetical to a lot of the stuff on this blog (which I read regularly and enjoy a lot). For my own personal enrichment, I was hoping you'd put up more of a fight!

David McKenzie
July 18, 2013

Thanks Justin, I agree that when I describe the rest of the book as hit-or-miss, it is mostly "miss" - it would have been nice to see some real examples where they had tried seriously to take evidence from one place to apply to another using their proposed method, and show how they were able to do so (or not) -
and totally agree on point 2) - that even if studies are only relevant  for the particular population, this in itself can be value-for-money and useful.

Michael Clemens
July 18, 2013

The classic book by Shadish, Cook, and Campbell makes a useful distinction here.
They divide what most people call "external validity" into two: 1) Testing what they call external validity means asking if the effect of the same treatment is the same in a different time or place, 2) testing what they call "construct validity" means asking if the effect is the same when the treatment itself is different (perhaps in the same setting).
The Tamil Nadu Integrated Nutrition Project is a perfect example of the measured effect differing because the treatment bundle is not the same in the two experiments---a difference of construct validity that could occur even in the same time-and-place.

Shamika Ravi
July 20, 2013

We shed some light on external validity in our recent paper - "Substitution Bias and External Validity:
Why an innovative anti-poverty program showed no net impact" -- http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2134779
Shamika Ravi