Syndicate content

Curves in all the wrong places: Gelman and Imbens on why not to use higher-order polynomials in RD

David McKenzie's picture
A good regression-discontinuity can be a beautiful thing, as Dave Evans illustrates in a previous post. The typical RD consists of controlling for a smooth function of the forcing variable (i.e. the score that has a cut-off where people on one side of the cut-off get the treatment, and those on the other side do not), and then looking for a discontinuity in the outcome of interest at this cut-off. A key practical problem is then how exactly to control for the forcing variable.

There are two common approaches to controlling for the forcing variable. The first approach is to control for a high order (third, fourth, or more) polynomial. The second approach is to use local linear or local quadratic regression only within a neighbourhood of the cutoff.

A new NBER working paper by Andrew Gelman and Guido Imbens makes a strong argument not to use the higher-order polynomial approach. The paper is extremely clear, making its points in only 14 pages which is a pleasant change from the typical paper. Their case rests on three reasons.

Why are high order polynomials problematic?

Reason 1: They can give huge weight to points that are far away from the discontinuity. The RD estimate is essentially a difference between a weighted average of outcomes for treated observations on one side of the discontinuity and a weighted average of outcomes for control observations on the other side of the discontinuity. Fitting a high order polynomial can mean this weighted average is driven by observations that are far away from the threshold.

Reason 2: Estimates can be highly sensitive to the degree of the polynomial fitted. Partly as a result of the large weights being put on different observations, the estimated impact can jump around a lot depending on what order polynomial is fitted.
Andrew Gelman gave a great example about a year ago on his blog, commenting on a study in PNAS that claimed that China’s coal-burning was reducing lifespan by 5 years for half a billion people.  Here is the key figure:

You can see that a cubic is fitted, which results in a statistically significant estimate of -5.5 years. With a linear the estimate is -1.6 years, with a quadratic -1.3 years (neither significant), and with a quartic or quantic, back to -5.4 to -5.6 years and significant.

Reason 3: Confidence intervals can be too narrow with high-order polynomials, leading to rejections of the null much more than should be the case. i.e. there is a bias towards finding a significant effect even if one doesn’t exist. To show this, the authors use some data where there is no discontinuity and simulate tests to detect a discontinuity. The table below shows their results. A test at the 5% level rejects the null of no treatment effect 21 percent of the time with a cubic, and still 11-14% of the time with a quartic or quintic. The bottom of the table shows local linear and local quadratics do better, being close to the true test size of 5%.

Now as Lee and Lemieux note, there are also bias issues to consider with local linear regression, and so one should not just rely on one particular specification – with results that are robust to range of specifications being more convincing. Practically an issue is just how much data you have close to the threshold-  with lots of data, ignoring the data further away by only doing local regression seems preferable, but with a smaller sample, there may be more of a need to try to use as much of the data as possible, at the cost of making the results more dependent on functional form.
 

Add new comment