# Curves in all the wrong places: Gelman and Imbens on why not to use higher-order polynomials in RD

There are two common approaches to controlling for the forcing variable. The first approach is to control for a high order (third, fourth, or more) polynomial. The second approach is to use local linear or local quadratic regression only within a neighbourhood of the cutoff.

A new NBER working paper by Andrew Gelman and Guido Imbens makes a strong argument not to use the higher-order polynomial approach. The paper is extremely clear, making its points in only 14 pages which is a pleasant change from the typical paper. Their case rests on three reasons.

**Why are high order polynomials problematic?**

**Reason 1: They can give huge weight to points that are far away from the discontinuity.**The RD estimate is essentially a difference between a weighted average of outcomes for treated observations on one side of the discontinuity and a weighted average of outcomes for control observations on the other side of the discontinuity. Fitting a high order polynomial can mean this weighted average is driven by observations that are far away from the threshold.

**Reason 2: Estimates can be highly sensitive to the degree of the polynomial fitted.**Partly as a result of the large weights being put on different observations, the estimated impact can jump around a lot depending on what order polynomial is fitted.

Andrew Gelman gave a great example about a year ago on his blog, commenting on a study in PNAS that claimed that China’s coal-burning was reducing lifespan by 5 years for half a billion people. Here is the key figure:

You can see that a cubic is fitted, which results in a statistically significant estimate of -5.5 years. With a linear the estimate is -1.6 years, with a quadratic -1.3 years (neither significant), and with a quartic or quantic, back to -5.4 to -5.6 years and significant.

**Reason 3: Confidence intervals can be too narrow with high-order polynomials, leading to rejections of the null much more than should be the case.**i.e. there is a bias towards finding a significant effect even if one doesn’t exist. To show this, the authors use some data where there is no discontinuity and simulate tests to detect a discontinuity. The table below shows their results. A test at the 5% level rejects the null of no treatment effect 21 percent of the time with a cubic, and still 11-14% of the time with a quartic or quintic. The bottom of the table shows local linear and local quadratics do better, being close to the true test size of 5%.

Now as Lee and Lemieux note, there are also bias issues to consider with local linear regression, and so one should not just rely on one particular specification – with results that are robust to range of specifications being more convincing. Practically an issue is just how much data you have close to the threshold- with lots of data, ignoring the data further away by only doing local regression seems preferable, but with a smaller sample, there may be more of a need to try to use as much of the data as possible, at the cost of making the results more dependent on functional form.

## Join the Conversation