Your go-to regression specification is biased: here’s the simple way to fix it


This page in:

Today, I am writing about something many of you already know. You’ve probably been hearing about it for 5-10 years. But, you still ignore it. Well, now that the evidence against it has mounted enough and the fix is simple enough, I am here to urge you to tweak your regression specifications in your program evaluations.

A new paper by Gibbons, Serrato, and Urbancic in the Journal of Econometric Methods (a) shows that OLS with Fixed Effects is not a consistent estimator in the presence of heterogeneous effects; (b) develops estimators to deal with the problem; and (c) shows that the bias can matter substantively in some cases. You can download commands for R and Stata here if you wish. The problem in a nutshell is that the FE specifications are not fully flexible – allowing the intercepts of the conditional expectations to vary but not the slopes.

The implication, however, is simple and we have covered it here at least twice in the past five years. When you have covariate-adjusted regressions, including your block fixed effects, etc., demean (or center) your covariates and fully interact them with your treatment indicator. Then, the coefficient on T is still the average treatment effect, which is now unbiased and consistent even in the presence of heterogeneity (along these covariates). Winston Lin wrote about this in a guest post almost six years and David blogged about this when he discussed Imbens and Rubin’s book two years ago (see point 4 here). Both Lin (2013) and Imbens & Rubin (2015) prefer this specification for covariate-adjusted regressions. If you did not believe them or you thought this is not important enough, then maybe this paper will put you over the edge…

A couple of caveats: First, this will be less useful when you have a ton of fixed effects. For example, you may have pair-randomized the treatment assignments and won’t be able to establish heterogeneity in treatment effects from pairs. The advice on this usually is that you should have blocks (strata) for random assignment that are small, but not pairs. With lots of FE, you may still run into degrees of freedom issues with small sample sizes, however.

Second, Gibbons, Serrato, and Urbancic emphasize the fact that the adjustments make a difference. However, a quick look in their Appendix D.3 Table 6 suggests that in only one of the eight papers they examined, the correct specification makes a big difference. In all the others, the results are what the original authors would have called “robust” to different ways of handling this issue. My personal experience is also that this has not made a difference to findings in my own experiments thus far. The Karlan and Zinman (2008) paper is a perfect example for them where the variance in Treatment is much larger in one sub-group AND in the presence of effect heterogeneity along that dimension.

So, there you have it. Just add a few lines to your code to center covariates and create interactions and you’re set. No reason not to do it…



Berk Ozler

Lead Economist, Development Research Group, World Bank

Join the Conversation

February 12, 2018

At the risk of revealing my ignorance, could you expand on why you would need the adjustment in the case of pairwise randomization? My understanding from Prop 1 of their paper is that the problem arises when there's heterogeneous variance in treatment after conditioning on the controls. In the case of pairwise randomization (and other stratified randomization schemes with constant treatment propensities) it seems like treatment variance should be constant by construction.

Berk Ozler
February 12, 2018

No, you're exactly right: that's related to what I meant when I said "...[you] won’t be able to establish heterogeneity in treatment effects from pairs." But, I should have been clearer...

David Reinstein
February 14, 2018

Thanks for the post. To be clear, are you suggesting that "demean[ing] (or center) your covariates and fully interact them with your treatment indicator" itself will yield a coefficient representing the ATE? Is there a simple proof of that somewhere? If that is all that needs doing, what is the need for the more involved procedure (and R/Stata code) that Gibbons et al provide? Sorry if I'm being dense here.

Berk Ozler
February 14, 2018

You can look at Lin's paper that is linked, but, yes, fairly straightforward. When you demean, the coefficient estimate on the uninteracted T will be evaluated at the mean of all variables, rather than for a specific left-out sub-group had you not centered the covariates.
On the R/Stata code, the authors have two estimators they propose and I was referring to only one of them. It's possible that the other estimator, the RWE, requires a little more assistance from the authors...