The simplest way of analyzing a randomized experiment is to just compare the means for the treatment and control groups, which can be accomplished in a regression setting by the regression:
Y = a + b*Treatment + e
However, in practice it is common for researchers to wish to add additional control variables X to this regression for two possible reasons: (1) to improve statistical power by including control variables that are strong predictors of the outcome Y. The most important such variable is often the baseline value of the outcome, leading to an Ancova regression; and (2) to account for the possibility that despite randomized assignment, Treatment might still be correlated with the error term e because of an unlucky random draw, especially in small samples, or because of attrition. Once one decides to start including such control variables, the question is then how to choose which ones to include, a choice that is often ad hoc and can give rise to concerns about researcher degrees of freedom and possible p-hacking.
The Post-double selection Lasso (PDS Lasso) method of Belloni et al. (2014) has become a popular approach for selecting which control variables to include. In a new working paper, Jacobus Cilliers, Nour Elashmawy, and I look at how this performs in practice, and draw out lessons and advice for empirical researchers.
A recap of the PDS Lasso method
The method was initially designed for observational studies, and allows the number of potential controls p to be high dimensional, even exceeding the sample size n. It follows a three-step procedure for selecting the set of controls:
Step 1: Select controls that predict the outcome, through a Lasso regression of Y on X, without controlling for treatment. Lasso uses a tuning parameter λ which controls how much the inclusion of additional covariates is penalized.
Step 2: Select controls that predict the treatment, through a Lasso regression of Treatment on X.
Step 3: Estimate the treatment effect by regressing Y on Treatment and on the union of the subsets of controls selected in the first two stages, alongside a possible third set of variables (called the amelioration set) that researchers want to force the model to include. Inference can then proceed using the standard heteroskedasticity robust standard errors.
This can be implemented in Stata with the pdslasso program, or the built-in dsregress command (we discuss in the paper why they can give slightly different results).
What do we do?
We replicate and re-analyze 780 treatment estimates from 18 field experiments published in the AER, AEJ Applied, and JDE that use PDS Lasso, comparing Ancova estimates to PDS Lasso, and to different versions of how PDS Lasso selects the tuning parameter λ. These experiments cover a wide range of different outcomes, attrition rates, and sample sizes, including both clustered and non-clustered experiments. This provides us with a lot of detail on both how authors are using this approach, as well as to what the effects of using it are. We supplement this with some simulations.
How much difference does PDS Lasso make in practice?
The median sample size is 1,913 observations, with a median of 206 clusters. Researchers often are throwing a whole slew of variables into PDS Lasso, but we find that in practice very few get selected. The mean (median) number of potential control variables researchers use PDS Lasso to decide amongst is 405(182). However, Figure 1 shows the distribution of how many variables are selected in practice in the two stages.
· In step 1 the mean is only 2.7 variables, and median is only 1 variable being selected.
· In step 2, no variables are selected in over half the cases, and when variables are selected, it is typically only one or two variables.
The result is that out of the hundreds of variables that researchers are feeding into PDS Lasso, the mean is for only 3.6 and median for only 2 variables to be selected as controls, and the 75th percentile is only 5 variables.
Given that very few control variables are usually chosen, we not surprisingly find that using this method makes only very small differences to the magnitude of treatment effects or to their precision:
· The mean change in treatment effect estimate is 0.05 S.D., and median is 0.01 S.D.
· The standard errors from PDS Lasso are, at the median 99.2% the size of those from Ancova. Moreover, in about 25% of the cases using PDS Lasso actually results in larger standard errors than would be the case just using Ancova.
The result is that researchers should not count on the use of PDS Lasso to deliver much in the way of power gains in their experiments – and so our optimistic grant proposals and registered reports which tend to claim we can squeeze more power by including control variables should be tempered.
We do find PDS Lasso to make more of a difference when the attrition rates are higher, and when there is no lagged dependent variable available to control for, but even then in the majority of cases the changes are very small in magnitude. This seems to reflect that many of our economic outcomes are very hard to predict (or at least to predict beyond one or two variables a researcher would likely add anyway or stratify on), and that most attrition seems in practice to be lots of idiosyncratic reasons that are not correlated with the outcome.
What did we learn about a lot of practical implementation issues?
· Choosing the tuning parameter λ: Belloni et al. derive a “plug-in” parameter based on asymptotics, but this can underselect variables in small samples. We look at whether using cross-validation to select λ. We find that this choice does lead to substantially more controls being selected, but risks overfitting, and can sometimes result in much larger standard errors. We recommend researchers stick with the default parameter.
· How many and which covariates should be included in the control set? As noted, researchers tend to throw the kitchen sink at it, thinking that PDS Lasso can then choose from among several hundred possible variables. But starting with a very large set of variables makes it more likely that none get selected due to the penalty parameter increasing with the number of potential controls. Instead we recommend that researchers include the lagged variable and randomization strata in the amelioration set (to ensure they are included), and then be much more judicious in the choice of the set of variables that input. This echoes a point I made a few weeks ago about how many variables to include in a balance test.
· Watch out for missing variables: It is important to ensure that all the potential control variables being used have no missing values, by dummying out missing values and including these dummies as additional controls as needed. We found that 8 out of the 18 papers that we examined had inadvertently used PDS Lasso on a smaller sample than their OLS sample because of this issue. We note that researchers should then be ok with e.g. PDS Lasso only selecting a dummy for missing age, even if it does not select age as a control.
· Partial out fixed effects rather than having PDS Lasso select a subset of them: the approximate sparsity assumption needed for Lasso can be problematic with categorical variables, and be quite sensitive to the normalization (which category is used as the base case) – so we do not recommend having PDS Lasso select a subset of the fixed effects without substantial researcher oversight and domain knowledge on how to prune and combine categories.
· Another common error occurs is dealing with treatment interactions: When including a model with a treatment and treatment interaction, ensure the interacting variable is included in the amelioration set. Multiple papers made coding syntax errors which led to the interacting variable being inadvertently modeled as if it were an additional treatment.
· Watch out for a lack of degrees of freedom adjustment in the pdslasso command: if one runs PDS Lasso and no control variables are selected, the reported standard errors can be slightly less than those from simply regressing the outcome on treatment, due to a lack of a degrees of freedom adjustment. This can be easily solved by using PDS Lasso to select the controls, and then separately running a regression with the selected controls.
The paper goes into a lot more details on these and other issues (such as dealing with multiple treatments), and we hope it will be useful for experimental researchers. Our bottom line is that: we see PDS Lasso as a useful robustness check for many field experiments, providing a less ad hoc way of selecting additional control variables on top of any lagged variable and randomization strata. Most of the time it should make very little difference to the estimated coefficients and standard errors. When it does make a sizeable change in the coefficients, this can provide a useful warning for researchers that they can no longer rely simply on random assignment to justify their results.
Join the Conversation