# From my inbox: Three enquiries on winsorizing, testing balance, and dealing with low take-up

|

I’ve been travelling the past week, and had several people contact me with questions about impact evaluation while away. I figured these might come up again, and so I’d put up the questions and answers here in case they are useful for others.
Question 1: Winsorizing – “do we do this on the whole sample, or do we do it within treatment and control, baseline and follow-up?”
Winsorizing is commonly used to deal with outliers, for example, you might set all data points above the 99th percentile equal to the 99th percentile. It is key here that you don’t use different cut-offs for treatment and control. For example, suppose you have a treatment for businesses that makes 4 percent of the treatment group grow their sales massively. If you winsorize separately at the 95th percentile of the treatment distribution for the treatment group and at the 95th percentile of the control distribution for the control groups, you might end up completely missing the treatment effect. I think it makes sense to do this with separate cutoffs by survey round to allow for seasonal effects and so you aren’t winsorizing more points from one round than another (which could be the case if you used the same global cutoffs for all rounds).

Question 2: Testing balance when the probability of assignment varies by strata. “Suppose that we are evaluating a scholarship program involving 2 schools of the same type, so that the treatment is getting a scholarship to go to this type of school.  For each school there is a pool of candidate students that we are able to randomly assign, within each school, to either treatment (getting a scholarship) or not (no scholarship).  Suppose that School 1 has 100 candidates and spaces for 80 scholarship recipients, but all 100 are just barely above the cutoff point for admission to this type of school (the cutoff is based on some test, and it is the same for both schools).  Meanwhile School 2 also has 100 candidates but has space for only 20, and these candidates are on average better (higher score on the admissions test) than those in the School 1.  Then for our total sample the "treatment group" of 100 students will have worse skills than the control group because 80 kids in the treatment group are from School 1 (which had worse schools, on average, compared to School 2) while the "control group" of 100 students will have generally better skills because 80 of then are from School 2.  Then this overall randomization could "fail" a balance test in terms of initial test scores (and perhaps some other variables) because the 100 kids in the treatment group will have lower "pre-test" scores than the 100 kids in the control group.  Intuitively, I don't think that this is a problem because we randomize within both Schools.  Perhaps the balance test should control for which schools the kids are from, so that it is a balance test within schools rather than across schools as well.”
The intuition is correct, in that all you need to do is control for the randomization strata and then everything is random conditional on that. This is an example of the more general point that you should always control for randomization strata. I faced this issue in a recent internship program evaluation, where the probability of treatment varied from 37 to 96 percent across different strata. As a result, a simple comparison of means shows significant differences, but after controlling for strata it is ok (see Table 1 and discussion bottom of page 6). The only further complication here is to think about what average treatment effect you are interested in if there is heterogeneity in effects by strata – in the above example, are you interested in the average school-level effect, or the average student-level effect. If the former, you would need to re-weight.

Question 3: Dealing with low take-up of a program: “.During the pilot around 50% of the treated did not accept the treatment and this became a problem for the organization that provides the training. For the current round the team expects to get 150 eligible individuals to include in the randomization. However, we are really concerned because using the same method from the pilot for the randomization, this would lead us to 75 in each group and around 37 of them will probably reject the treatment. Therefore, we would end up with 38 take-up treatment. Do you think it is possible to use a different randomization method that allows us to replace those individuals that do not accept the treatment? Or should we have a bigger treatment group?”