Published on Development Impact

Why similarity is the wrong concept for External Validity

This page in:
I’ve been reading Evidence-based policy: a practical guide to doing it better by Nancy Cartwright and Jeremy Hardle. The book is about how one should go about using existing evidence to move from “it works there” to “it will work here”. I was struck by their critique of external validity as it is typically discussed.
They note that when people mention external validity , they use it to mean that the ‘same treatment’ has the ‘same result’ in a specific target setting as it did in the study for which there is evidence; with the orthodox advice that this can be expected if the target population is ‘sufficiently similar’ to the study population. They have four critiques of this:
  1. The notion of the same treatment is too vague – they give an example of the Tamil Nadu Integrated Nutrition Project which integrated feeding, health measures, and education of pregnant mothers about how to better nourish their children and was found to be responsible for a significant drop in malnutrition. But the program was then implemented in Bangladesh, with little success. The problem being in Bangladesh that mothers-in-law, not mothers, were in charge of handing out food – so the ‘same treatment’ shouldn’t be ‘educate the mother’, but rather ‘educate the person in charge of food decisions’.
  2. “It is an absurdly tough test to require the same treatment effect as in the study population. If that is external validity, there is no real chance a study will have it”. The average treatment effect is the average of b(i), the treatment effects for each individual-context combination observed in the original study. They make clear it is extremely unlikely that every element of the context is going to be the same, and the distribution of treatment effects in the two populations would be the same. They note instead we may want to understand the same effect as, for example, being the same in terms of making a positive contribution in both places.
  3. Similarity is too demanding and the wrong idea: They give the example of the conclusions of a paper on the Moving to Opportunity experiment, which give a kitchen sink full of characteristics of the subset being studied – and note that if the results are only relevant to this very specific set of characteristics, maybe it is not worth doing the study, while it is not clear what reason they have for noting some characteristics and not others.
  4. Similarity is wasteful: the treatment effect is an average of some people that have more positive effects and others than have less (or even negative). You shouldn’t aim for the same mix of effects in another situation – instead you would want to give the treatment more to those who benefit more from it.
Instead of looking for similarities between your population and the study population, they say you should be looking for what matters to getting the prediction you have in mind correct (e.g. it will make a positive contribution). “What is wrong with the ideas of external validity and similarity is that they invite you to stop thinking”. You should have a theory of why something should work, and the supporting factors that will help this occur – and then use this to determine whether you think a policy should work, not just blindly say things look similar or not similar enough.

The rest of the book is then about how to make an Effectiveness Argument which has the following three steps to be able to conclude the policy will work (in terms of having a positive effect) in a new location:
  1. The policy worked somewhere – it played a positive causal role there, and the support factors necessary for it to play this positive role were present for at least some individuals there. A typical RCT or other impact evaluation is useful in establishing this step.
  2. The policy can play the same causal role in the new location as it did in the old.
  3. The support factors necessary for the policy to play a positive causal role in the new location are in place for at least some individuals
For example, they consider the case of smaller class size, where a randomized experiment in Tennessee found smaller classes achieved better reading scores, but when the same policy was tried in California it didn’t work. They note that in Tennessee the policy was done only in schools which had enough available space for extra classes, and no shortage of qualified teachers to teach the new classes created – these support factors weren’t in place in California, so the third step of the argument above doesn’t hold.
I found the rest of the book a bit hit-or-miss – lots of use of metaphors (e.g.“causal cakes” and “argument pyramids”) for causal chains which weren’t always easy to read. I did like the suggestion to do a pre-mortem in order to figure out what factors are necessary for your policy to work: i.e. to suppose that the policy you are about to implement turns out to fail, and to think through why they could be. There is no discussion of things like mechanism experiments, or treatment heterogeneity, or any other such tools to help in establishing what these steps in the causal chain and supporting factors might be.


David McKenzie

Lead Economist, Development Research Group, World Bank

Join the Conversation

The content of this field is kept private and will not be shown publicly
Remaining characters: 1000