Dear Jed and Lant,
Thanks Jed for your careful explanation of the results in our paper. First, we want to make clear that we sympathize very much with the comments from both Jed and Lant. We believe that in the paper we make two points that are useful regarding the topics discussed in your comments: 1) If one has control individuals from RCTs in different locations, and one is not able to at least have individuals with similar observable characteristics between control groups in different locations, then it is very likely that no comparison is feasible of the treatment effects of those RCTs. In our paper we check this through a common support condition (multi-site), which in our paper drops a very large number of the individuals in Riverside. This alone shows that Riverside is probably quite different to the other sites, and it is not a good idea to use it in comparisons. 2) It is indeed important to consider possible site-specific differences across locations (e.g., local economic conditions) when information from an RCT in one location wants to be used to make inferences about the potential effect from the implementation of a given intervention in another location. Our attempt at controlling for local economic conditions is rather specific to the data we have. But, the whole point is that we are explicitly imposing a model, and as any model, it needs to be evaluated for reasonableness. Indeed, these types of adjustments would necessarily need to be specific to the particular context, which is in spirit close to Jed's point about “careful theorizing”. Also, we agree with Lant that, at this level, it is not clear a priori whether extrapolating from an RTC is necessarily better than employing “non-rigorous evidence” (e.g., OLS) to evaluate the effects of the intervention at the location of interest using only data from that specific location. (Of course, in general, it is better to have alternative estimators of a given effect and choose the appropriate one depending of the situation at hand.)
Regarding Lant's comment about trying to apply these types of adjustment methods across countries, we completely share the strong concerns about the difficulties and dangers of doing so. Definitely anyone attempting to do that would need to approach the exercise with extreme caution. However, we still think that the first part of our approach, that of controlling for differences in individual characteristics, can be very useful, even if one is worried about differences in the environment that may overwhelm the differences in individual characteristics. To put it in another way: if one is not even able to show that at least based on certain characteristics there is enough “overlap” between the groups across locations, then it is time to stop the exercise and not even continue trying to assess external validity. But, if one were able to show that imposing some type of common support condition can eliminate a large portion of the differences in control groups, the next step would be to evaluate whether a carefully developed model could allow controlling for differences in the environment. For example, if several RCTs are produced in different regions within a country, then at least in those cases these types of adjustments may be useful (whether they can be useful across countries or not, the context and which countries are being compared would very much determine that). Of course, as also mentioned by Lant, at this point whether such an approach would be better than a “non-rigorous” approach using data only from the location of interest would likely depend on the application at hand.
It is great that more and more people are discussing these issues. We enjoyed reading the Pritchett and Sandefur 2013 paper (and agree with many of its points). We also encourage those interested in these topics to take a look at the recent papers by Rajeev Dehejia (2013, http://ideas.repec.org/p/unu/wpaper/wp2013-011.html) and by Allcott and Mullainathan (2012, http://www.nber.org/papers/w18373).
Carlos Flores & Oscar Mitnik