Published on Development Impact

Lee Bounds in Practice

David McKenzie

April 22, 2024

This page in:

Lee bounds are one of the most common ways for examining sensitivity to attrition in impact evaluations. They were proposed by David Lee in a 2009 ReStud paper, where the goal was to estimate the impact of a job training program on wages. The problem that arises is that wages are only observed for those who are working, and employment status may itself be affected by training. That is, there could be selective attrition on wages. Attrition could likewise arise in many experiments from survey non-response, mortality, or inability to track some of the sample.

The key assumption of Lee is a monotonicity assumption, which assumes that treatment only affects attrition in one direction. For example, in his setting, some individuals will always be employed and have wages regardless of treatment assignment, some will never be employed and never report wages, and then there will be others who will only be employed and report wages after training, but not if they are in the control. The monotonicity assumption is that there are not people for whom training instead reduced their likelihood of employment. In a standard setting of survey attrition, if we find response rates are higher for the treatment group than the control group, the assumption is that everyone who responded in the control group would also have responded if they were treated, and then there are some additional people who only responded because they got treated.

Then Lee’s solution is a to form bounds by trimming the differential attritors by assuming that they all come from the very top or the very bottom of the distribution. For example, suppose 84% of the treatment group report wages, and 75.6% of the control group. The differential attrition rate is 8.4%. Then the proportion of observations to trim is 8.4/84 = 10% of the treatment group. To get a lower bound on the treatment effect, we trim the top 10% of treatment wages. To get an upper bound of the treatment effect, we trim the bottom 10% of treatment wages. Note then one is identifying the treatment effect for the sample of individuals that always respond, which could differ from the ITT. As Lee argues, with attrition, this is the only subpopulation that the data will be informative about – so in his case, you would learn the effect of training on wages for those whose employment status is not affected by training.

There is a Stata command leebounds written by Harald Tauchmann to implement this, and the basic intuition is easy to understand, making this a popular approach. However, a number of practical questions arise when applying this, and I’ve collated some of the questions I’ve received over the years to try to help provide my thoughts on some of these issues.

1. What to do with covariates and tightening these bounds

Lee’s paper notes that you can potentially sharpen or narrow these bounds by using baseline characteristics, if these are correlated with the outcome. For example, if men and women earn different amounts, and monotonicity holds conditional on gender, then rather than trimming all the top earners or all the bottom earners unconditionally, one would do this separately by gender according to the amount of differential attrition for each gender. Then these group-specific bounds can be averaged up to get the overall effect, weighted by their group’s proportion of the overall set of all responders.

However, several issues arise here. The first is what to do with continuous covariates (e.g. age). Lee recommends discretizing into a small number of categories. The second is that you can only include a few covariates, since otherwise the cell sizes start getting very small, and it becomes hard to trim the right proportion within each cell. Third, with smaller samples, differential attrition may no longer all be in the same direction within each small cell – e.g. attrition may be higher for control than treatment on average, but once you look at women aged 25-29 with tertiary education, you might find slightly higher attrition in the sample for treatment than control. The leebounds command allows for this tightening on covariates, but notes estimation can fail if a large number of covariates are used.

Vira Semenova has a working paper on generalized Lee bounds which deals with these issues. She relaxes the assumption that monotonicity has to be in the same direction for all subjects, instead allowing it to differ depending on pre-treatment covariates. She also allows for more covariates and continuous covariates, using a post-Lasso procedure to regularize them. Revisiting Lee’s job training example, she shows this gives tighter bounds in that case. She has an R package (also called leebounds). This seems a promising approach, but I have not tried it out yet, and am not sure how well some of the asymptotic-based results will hold on the small samples we have in some development experiments.

A second, more practical question that I have received from colleagues is what to do when the regression model they are estimating has covariates in it, strata fixed effects, and perhaps is also from a clustered RCT, or uses multiple rounds of a panel, etc – since the leebounds command does not allow for these. My solution has always been to manually program the trimming for my regressions. For example, in the replication files for my business plan competition paper, I do this for the appendix on robustness to attrition. This way you also get to think about whether you want to trim conditionally or unconditionally. E.g. does it seem more likely that it is the highest wage earners conditional on their age, gender, educational status and region who can’t be surveyed, or just the highest wage earners overall?

2. What to do about ties in the outcome or binary outcomes?

The example in Lee’s paper has an outcome of wages, which is continuous. It is then easy to think about trimming the top or bottom 10%. But many of the outcomes we consider are binary, or might be truncated/have ties. The question is then how to handle trimming in such cases.

For example, in the first round follow-up of my business plan competition paper, one of the outcomes of interest was business survival. The control group had 11.8 percentage points higher attrition than the treatment group, which equated to me needing to trim 33 firms from the treatment group sample. But then this raised two issues: i) for the lower bound, I would want to trim firms in the treatment group that had the high value (open). But this was most of the firms. So then I set a seed and randomly chose 33 of the open treatment firms to trim; ii) for the upper bound, I would want to trim treatment firms with the lowest value of the outcome (being closed). But only 11 of the treated firms were closed down. So then I trimmed these 11, plus randomly chose another 22 of the open firms to trim.

The documentation for leebounds notes that for continuous outcomes with ties, it choses a fraction of those with the tied value to trim. But I’m not sure how it determines which ones, and this could pose a risk for reproducibility unless you set a seed. I’m not sure what it does for binary outcomes.

3. Do I need to do Lee bounds if I can’t reject that the attrition rates are equal for treatment and control?

I had a colleague write to me to say “we show in the paper that attrition is uncorrelated with treatment status, so that makes lee bounds a moot point, but a referee is still insisting we show them. What is the point if attrition is not selective, and can you even calculate them?”

What matters for being able to calculate Lee bounds is the magnitude of the difference in attrition rates between treatment and control groups, not whether this difference is statistically significant. So, for example, you might find a 3 percentage point difference in attrition rates, which is not significant. But this could still potentially make a difference to results if you assume that these additional 3 percentage points were all the highest earners, and you can still calculate Lee bounds. It is only when you have identical response rates for treatment and control that the bounds will collapse into a single point. But if differential attrition is not very high, then the bounds could be quite narrow in most cases.

A second point to note on this is that Lee bounds is not the only check one would like to see for robustness to attrition. I’ve seen papers that report something like “response rates were 32% for treatment, and 31% for control, and this difference is not statistically significant”. But even if they were to do Lee bounds for the 1 percentage point difference, we would still likely be very worried about what sort of selection is happening with the more than two-thirds of subjects that attrited. Here looking at whether the attritors and non-attritors are similar on baseline variables, whether there is balance between treatment and control among non-attritors, and using contextual understanding and perhaps some supplementary data to say something about the attritors would be needed to assure the reader.

4. What, if anything, changes if I’m interested in the LATE?

Another question I received was whether you can apply Lee bounds when you do not have perfect compliance in an RCT, and are interested in the LATE as well as the ITT. The ITT estimation can proceed as above. Two potential complications that might arise when estimating a LATE: i) if you don’t know the true compliance survey in the population, but have to estimate it from a survey that includes attrition. E.g. you offer training to job seekers, and those who take up training are more likely to be found (since you have more tracking information on them, and because they feel grateful). But if you don’t know from administrative data what proportion of those offered took up the training, your estimate of the compliance rate in treatment and control will itself be subject to potential bias because of attrition. If we think that the LATE = ITT/(p1-p2) where p1= take-up rate in treatment, and p2 = take-up rate in control, then attrition and differential attrition could affect our estimates of p1 and p2. So then you might have to consider bounds on these as well; ii) be careful how the trimming of outcomes affects the proportion of compliers and non-compliers – e.g. if all the lowest earnings are from the non-compliers in the treatment group, and all the highest earnings are from compliers, then the upper and lower bound ITTs will be estimated using a different proportion of compliers from each other – and an IV regression which uses the take-up rate in the sample as an instrument for treatment will then be varying not only attrition, but also (p1-p2).

5. How should I interpret results if my Lee bounds incorporate an opposite signed treatment effect?

I had someone write to me who was looking at the impact of a welfare program on whether someone was unhoused. The standard treatment effect estimate was negative, meaning that the program reduced the likelihood of being unhoused. Not surprisingly given the population, attrition was an issue, and the Lee upper bound was for a positive treatment effect (i.e. for the program to have actually increased the likelihood of being unhoused). The question was then whether this change in sign would then mean we could not conclude anything about the program.

Here it helps to think about how reasonable and likely the extreme assumptions being made to form the Lee bounds are in a particular context. E.g. in this case, perhaps to get the upper bound you need to drop 10% of the treatment group, who all had housing. That is, you have to assume all the additional controls that attrited were all those with housing. But if we know that in this context it is much easier to find and interview people with housing than without housing, and that only 15% of the control group have housing, it seems unlikely that all of the control group who couldn’t be found would have had housing. So you can then discuss the bounds, and say if we were in this extreme situation, this is what the bound says, but here are all the reasons why the lower bound in fact seems more plausible.

As an example of writing about this plausibility of which bound may be more informative, here is our discussion of these bounds in my paper with Suresh and Chris on returns to capital in Sri Lankan microenterprises: “To construct the Lee (2005) bounds we trim the distribution of profits for the group assigned to treatment by the difference in attrition rates between the two groups as a proportion of the retention rate of the group assigned to treatment. In our application, this requires trimming the upper or lower 5.2% of the real profits distribution for the group assigned to treatment. Doing this then gives a lower bound for the treatment effect of 404 LKR and an upper bound of 754 LKR, compared to the treatment effect of 541 in column (2), Table III. Similarly, the bounds for the return to capital of 5.3% estimated in column (4), Table IV, are 2.6% and 6.7%. The lower bounds occur only if it is the most profitable control firms that attrit. However, a panel regression predicting attrition as a function of the previous period’s profit finds no significant effect of having high profits on attrition, and that having the previous period’s profit in the bottom 10% lowers the probability of staying in the sample by five percentage points (p = .054). Attrition of the least profitable firms from the control sample would lead us to understate the returns, making the upper bounds more relevant.”

More resources:

See Berk’s post for ways of dealing with attrition and why he isn’t such a fan of Lee bounds, my post on the Behagel et al. approach for sharpening Lee bounds with using how much effort it takes to reach people, Florence and John’s sandbox for playing around with sensitivity to different attrition assumptions, and my recent post on whether you should exert special survey effort on just trying to close the treatment-control gap in attrition.

Additional notes from online comments

This post received several helpful online suggestions and links to further resources.

· Jon Roth suggested that instead of randomly breaking ties with a discrete outcome, you should instead just use weights to re-weight the observations appropriately.

· Vitor Possebom also noted with binary outcomes there are closed form solutions that combine Horowitz and Manski trimming bounds with monotone assumptions. He also noted that Kline and Santos (2013) gives additional examples of sensitivity analysis that can be done, and Imai (2008) provides another example of how to deal with truncation by death by making additional stochastic dominance assumptions.

· Cyrus Samii notes he has a complimentary paper to Semenova’s approach, that uses random forests to sharpen bounds with covariates.

Get updates from Development Impact

David McKenzie

Lead Economist, Development Research Group, World Bank

More Blogs By David

Join the Conversation

The content of this field is kept private and will not be shown publicly

Remaining characters: 1000

I have read the Privacy Notice and consent to my personal data being processed, to the extent necessary, to submit my comment for moderation. I also consent to having my name published.