Ready, Set, Bunch!

|

This page in:

Motivation

With the growing use of administrative data in economics, it is commonly observed that individuals or firms tend to locate at key policy thresholds — this behavior is referred to as “bunching”. Examples include firms reporting earnings just below thresholds above which taxes or costly regulations kick in, or individuals working hours just below thresholds above which they would be classified as full-time. 

Originally, this whole bunching behavior was seen as a technical complication — researchers were interested in modeling firm or individual behavior in response to taxes, and complicated threshold rules required carefully modeling both the tax schedule that firms or individuals faced and their behavioral responses (Kleven, 2016). For example, the impact of a change in taxes on earnings is a common parameter of interest, but when taxes only change above or below certain thresholds, bunching at those thresholds implies naive estimates of responses are challenging to interpret — Kleven (2016) has an in depth discussion of this history along with a broader review of bunching approaches. 

In contrast, more recent work (Saez, 2010) observed that the degree of bunching itself was informative about the responsiveness of earnings to a change in taxes (something we will discuss more in a future blog post). This observation has led to a deluge of recent work that estimates the amount of bunching at policy thresholds, and uses these estimates to recover deep economic parameters that are informative about individuals’ and firms’ responsiveness to changes in taxes.

Despite the growing use of bunching as a tool for estimation in applied microeconomics research, it is rarely covered in introductory courses on applied econometrics. Here, we attempt to provide a basic guide for the bunching novice (e.g., the majority of this blog’s authors) to bunching as an econometric tool.

 

Image
Fig1

 

Bunching as RD with manipulation

Bunching and RD

To begin, we lay out the standard problem bunching addresses — there is a policy threshold (e.g., a change in taxes) in some continuous running variable (e.g., earnings), and individuals sort from above the threshold to below the threshold (e.g., individuals reduce their earnings to fall just below the threshold). We are interested in estimating the fraction of individuals who reduce their earnings from above the threshold to below the threshold.

As noted by Kleven (2016), regression discontinuity (RD) is a close cousin of bunching estimators. In regression discontinuity, we maintain the assumption that there is no such “manipulation” as described above. Bunching relaxes this assumption — instead, we estimate the fraction of manipulators by estimating what densities of individuals would have been without manipulation, that is the “manipulation-free counterfactual”. With both the observed and manipulation-free counterfactual distributions of individuals estimated, it may be possible to compare the two distributions to recover the fraction of individuals who manipulated. This contrasts with RD, where the observed and manipulation-free counterfactual distributions are the same by assumption. We revisit this in a few paragraphs, when we discuss what can be identified by bunching estimators.

Two assumptions are therefore required to implement the two steps above. First, it must be possible to recover the number of manipulators from estimates of the observed and manipulation-free counterfactual distributions of individuals. This requires the assumption that manipulation is one-sided and bounded — individuals only move from below the threshold to above the threshold, and they do not move to or from far away from the threshold. This is similar to the monotonicity assumption in instrumental variables (see past discussions of monotonicity assumptions on this blog here and here) — if individuals manipulate both ways, we could not hope to count the number of individuals who manipulate. Second, it must be possible to estimate the manipulation-free counterfactual distribution of individuals. This requires the assumption above that manipulation is bounded, and also that the counterfactual distribution of individuals is “well-behaved” — if this holds, then the distribution of individuals close the threshold can be estimated by fitting a distribution using only the individuals who are sufficiently far from the threshold that they do not manipulate in response to the threshold. So, how strong are these assumptions?

 

  • Manipulation within a window. The first assumption, that manipulation is one-sided and is bounded within a window of manipulation, is often weaker than it sounds. In many cases, this assumption is economically motivated, as changes in taxes or policy at a threshold may only incentivize individuals to shift in one direction. For example, a sudden increase in the marginal tax rate at a threshold will not cause any individuals to shift from below the threshold to above the threshold. Similarly, bounds on where manipulation occurs can be motivated by bounds on elasticities, something we will discuss more in a future blog post. Alternatively, in practice it is often clear that almost all manipulation occurs within a small window of the threshold through visual inspection.

 

  • Regularity. The second assumption, that the distribution of individuals is “well-behaved”, may be fairly strong. In practice this often takes the form of assuming the manipulation-free counterfactual distribution of individuals is polynomial of finite degree and smooth across the threshold. When this holds, this polynomial can be estimated using the individuals outside the window of manipulation, so the manipulation-free counterfactual distribution can be extrapolated from the observed distribution of individuals outside the window of manipulation. This regularity assumption is crucial — Blomquist et al. (2019) note that absent this assumption, additional sources of variation are necessary in order to estimate the fraction of individuals who manipulate. It may also be testable, as it implies estimates of bunching should be robust to more flexible estimation approaches.

 

Image
Fig2
Bunching at a kink in Saez (2010) by self-employed, but not wage earners

 

Bunching in practice

In practice, bunching is typically implemented as a two-step procedure. The first step is to identify the interval over which the running variables displays excess density. When all the excess mass is concentrated precisely at the threshold this is easily done. However, the bunching mass is often spread around the threshold as individuals cannot precisely choose the running variable. In this case, researchers need to determine a window over which to measure the excess mass. Early bunching papers did so visually, since in many empirical settings the excess mass area is evident. There are however methods to do so empirically (e.g., Bosch et al., 2020).

This takes us to the second step: estimating the manipulation-free counterfactual. The typical assumption is that the counterfactual distribution is well approximated by a flexible polynomial, fitted over the manipulation free part of the distribution and excluding the bunching window. The extrapolation of this polynomial to the excluded window provides the estimate of where individuals would have located in that interval in the absence of manipulation. The researcher can then compare the manipulation-free counterfactual to the observed distribution to measure the excess mass and missing mass below and above the thresh.

How credible is this extrapolation? A critique states that bunching is an “unassumed” parametric method and that the assumptions made to recover the manipulation-free counterfactual should be clearly spelled out. Related to this, for robustness applied bunching typically presents estimates that vary the window over which manipulation is allowed and the order of the polynomial used to fit the counterfactual. Moreover, in many settings, researchers have a good prior of the shape of the manipulation-free distribution they are trying to approximate: for example the distribution of high-income earners is commonly Pareto and that of firm size approximately log-normal.

Lastly, to conduct inference, Kleven (2016) note that standard errors are commonly calculated following Chetty et al. (2011), by bootstrap resampling of residuals in the estimation of the counts of individuals within bins. As sample sizes are sufficiently large in many bunching applications that sampling error may be small, this approach to inference may partially account for misspecification error (e.g., getting the order of the polynomial wrong).

 

What can bunching estimate?

In many applications of bunching, only the fraction of individuals who manipulate is of interest, as this fraction can often be used to recover important elasticities. However, in general, estimates of both the manipulation-free counterfactual (that is, what distributions would have been without manipulation) and observed distributions can be used to estimate other interesting parameters.

First, as mentioned above, the manipulation-free counterfactual distribution is what we assume that we observe for RD, as in RD we assume there is no manipulation. Therefore, anything that can be estimated in RD can also be estimated with bunching, including the average impact of moving from just below to just above the threshold for individuals at the threshold. Although this is not of interest in many applications (e.g., when marginal but not average taxes change at a threshold, there should be no discrete change in outcomes across the threshold absent manipulation), in other applications it is first order (e.g., test score thresholds for school choice). This is closely related to donut regression discontinuity designs (see discussion of these in Dowd, 2020), in which RD is implemented while excluding observations close to the threshold (commonly because of concerns about measurement error).

Second, manipulation-free counterfactual distributions of individuals interacted with covariates or outcomes allow estimation of the average characteristics of manipulators or the impacts on manipulators of manipulation, respectively (Diamond & Persson, 2016). To give two examples — Diamond & Persson (2016) apply this to estimate the impacts of passing high school through teachers’ manipulation of students’ test scores above the passing threshold, while Londoño-Vélez & Ávila-Mahecha (2020) apply a similar approach to estimate the average offshore holdings of individuals who manipulate their reported wealth in response to wealth taxes.

In a future post we will return to how different bunching estimators can be interpreted economically.

Authors

Pierre Bachas

Economist, Development Research Group

John Loeser

Young Professional (Economist), Development Impact Evaluation