The New York Times recently had a piece on the retraction and re-issuance of a study in Spain based on a randomized trial of the Mediterranean Diet’s effect on heart disease. The original study was meant to be an individualized random assignment of 7,447 people aged 55 to 80 to one of three different diets – a control diet (advice to just reduce fat content), or two variants of the Mediterranean Diet (in which they were given free olive oil or free nuts). The study was originally published in the New England Journal of Medicine (NEJM) in 2013. The authors then appear to have been surprised to find their study on a list of suspicious trials. There are several parts to this story I thought would be of interest for doing impact evaluations in development, which I discuss below.
randomization in the wild
I’ve been travelling the past week, and had several people contact me with questions about impact evaluation while away. I figured these might come up again, and so I’d put up the questions and answers here in case they are useful for others.
Question 1: Winsorizing – “do we do this on the whole sample, or do we do it within treatment and control, baseline and follow-up?”
Winsorizing is commonly used to deal with outliers, for example, you might set all data points above the 99th percentile equal to the 99th percentile. It is key here that you don’t use different cut-offs for treatment and control. For example, suppose you have a treatment for businesses that makes 4 percent of the treatment group grow their sales massively. If you winsorize separately at the 95th percentile of the treatment distribution for the treatment group and at the 95th percentile of the control distribution for the control groups, you might end up completely missing the treatment effect. I think it makes sense to do this with separate cutoffs by survey round to allow for seasonal effects and so you aren’t winsorizing more points from one round than another (which could be the case if you used the same global cutoffs for all rounds).
Here’s one version:
I have a question about an experiment in which we had a very big problem getting the individuals in the treatment group to take-up the treatment. Therefore we now have a treatment much smaller than the control. For efficiency reasons does it still make sense to survey all the control group, or should we take a random draw in order to have an equal number of treated and control?
And another version
Context: you are randomly selecting people for some program such as a training program, transfer program, etc. in which you expect less than 100% take-up of the treatment from those assigned to treatment. You are relying on an oversubscription design, in which more people apply for the course/program than you have slots.
One of the comments we got last week was a desire to see more “behind-the-scenes” posts of the trials and tribulations of trying to run an impact evaluation. I am sure we will do more of these, but there are many times I have thought about doing so and baulked for one of the following reasons:
Despite the large and growing literatures on migration in economics, sociology, and other social sciences, there is surprisingly little work which actually evaluates the impact of particular migration policies (most of the literature concerns the determinants of migrating, and the consequences of doing so for the migrants, their families, and for native workers). I am therefore always interested to see new work in this area, particularly work which manages to obtain experimental variation in policy implementation.