Whether to probit or to probe it: in defense of the Linear Probability Model
This page in:
Last week David linked to a virtual discussion involving Dave Giles and Steffen Pischke on the merits or demerits of the Linear Probability Model (LPM). Here are some of the original posts, first with Dave Giles castigating users of LPM (posts 1 and 2), and Pischke explaining his counter view. I am very sympathetic to what Pischke writes. I think graduate econometric training has inured a knee-jerk preference for a non-linear response model such as a probit or logit. In fact the entire discussion brought back an extended exchange with a recalcitrant referee of one of my own papers that highlights common resistance to the LPM.
The paper in question looks at how infant mortality covaries with aggregate economics shocks in African countries. The main outcome is death within the first 12 months of life, hence a binary variable. My co-author, Norbert Schady, and I decided to model the relationship with a LPM. While limitations of the LPM are well known, we felt they were not particularly relevant in this setting.
For background, let’s review the most pressing short comings of LPM vis-à-vis index models for binary response such as probit or logit: 1. LPM estimates are not constrained to the unit interval. 2. OLS estimation imposes heteroskedasticity in the case of a binary response variable. Now there are ways to address each concern, or at least consider their applicability to the question at hand. Equally important it is not clear that a particular index model performs any better (more on this below).
To address the second concern, we use heteroskedasticity-consistent robust standard error estimates. This approach, as used in this paper by Josh Angrist and others, is a common response to the potential problem. So strike the second concern off the list.
The first concern, though, is regarded as more serious and as a result of this concern the LPM estimate can be biased and inconsistent. Horrace and Oaxaca show that, as the relative proportion of LPM predicted probabilities that fall outside the unit interval increases, the potential bias of the LPM increases. Conversely if no (or very few) predicted probabilities lie outside the unit interval then the LPM is expected to be unbiased and consistent (or largely so). Horrace and Oaxaca even suggest an LPM trimming rule that excludes observations whose predicted probability lies outside the unit interval to reduce possible finite sample bias. This is an interesting suggestion that deserves further inquiry.
So in general LPM has the possibility for bias and inconsistency, but less so the greater the proportion of predicted probabilities fall between 0 and 1. In the example of our paper, it turns out that the predicted probabilities of infant mortality from our main specification all lie in the interval (.038, .206). That is, no predicted probability lies outside the unit interval and so it appears, a la Horrace and Oaxaca, our main estimate is unbiased and consistent. But this didn’t set the referee at ease.
Nor was the referee relieved by the fact that the LPM is increasingly seen as a suitable alternative to the probit or logit. For example Wooldridge writes on p. 455 in the 2002 edition of his well known textbook:
…If the main purpose is to estimate the partial effect of [the independent variable] on the response probability, averaged across the distribution of [the independent variable], then the fact that some predicted values are outside the unit interval may not be very important.
Another result from our paper: the LPM predicted probabilities are nearly identical to the predicted probabilities from a probit model. (It’s always good practice to check result robustness to model specification.) We found the correlation between the two predicted probability vectors to be .9998. But even this concordance wasn’t good enough for our referee who still insisted on a probit specification in the main tables.
What is behind this insistence? I believe our referee was stuck on a non-linear binary response model simply because that is the “correct” approach that we are taught in graduate econometrics. Yet this insistence strikes me as very odd. After all, a binary response model such as a probit or logit makes some pretty strong (and convenient) modeling assumptions on the behavior of the error term in the stipulated underlying structural model. How do we know these assumptions are the correct ones? And if the assumption is wrong, presumably the bias can also be significant.
The bottom line is that probit or logit models themselves are not without interpretive difficulties and it is far from clear that these models should always be preferred. As Pischke succinctly states:
The LPM won’t give the true marginal effects from the right nonlinear model. But then, the same is true for the “wrong” nonlinear model! The fact that we have a probit, a logit, and the LPM is just a statement to the fact that we don’t know what the “right” model is. Hence, there is a lot to be said for sticking to a linear regression function as compared to a fairly arbitrary choice of a non-linear one! Nonlinearity per se is a red herring.
So here’s a call to keep the LPM – it’s convenient, computationally tractable, and may have less bias than index model alternatives. In many settings we will never know. Of course as good practice we should explore result robustness to model choice. Hopefully, as in the case of our paper, specification choice just won’t matter for the bottom line.
I completely agree with your viewpoint. I love the studies that attempt to show the superior performance of probit by starting with a normal cdf for the error term. Great blog by the way. (And as an aside, go blue and go cardinal).
My limited understanding of the problem was that it depends quite a bit on the parameter values? For instance, the CDF of the normal is quite linear over the middle portion of the range, but clearly non-linear at the extremes. Would you expect the two estimates to be different if the likelihood of treatment is very small or very large? In the case where you need to instrument and treatment and outcome variables are binary (so you have a choice between bivariate probit or linear IV), we found that the parameter values affected the choice, and for certain parameter values, the linear model offered little guidance for hypotheses testing. The paper is here
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1792259
you are linking to a paper by my colleagues Misha Lokshin and Jishnu Das, along with Richard Chiburis. I definitely need to review that paper more carefully as I am sure it has a lot of insight. On a quick read it looks as though they model the joint error terms as bivariate normal. What I don't know is how sensitive the results are to this assumption. But clearly in that world a probit is the way to go.
I agree completely, and I think the selective insistence on logit/probit in the case of binary outcomes borders on absurd. We work with outcomes that are really categorical all the time, especially schooling, and nobody seems to seriously think we should throw the multinomial logit at all of them.
I wrote a blog post pointing this out in response to the original Pischke-Giles back-and-forth: http://nonparibus.wordpress.com/2012/07/13/the-linear-education-model/
more on this....in the bivariate probit versus linear IV paper, we looked at departures from normality of the joint distribution of errors in terms of excess skewness and excess kurtosis. Clearly, this biases the bivariate probit results and the bias can be quite severe with little guidance on what to expect.
Couple of points on this.
First, a score test along the lines suggested by Murphy (Economic Letters) performs fairly well in dectecting mis-specification in the bivariate probit model, so that his may be a useful addition to the arsenal.
Second, we were motivated by practical setups that researchers may face. For instance, if you have a problem where the likelihood of treatment is very low or high (say 10% are treated---as in the returns to private schooling literature in the U.S.), the linear IV generates confidence intervals that are so wide that hypothesis testing is basically impossible until you hit around 10,000 observations--or even more. To cite from the paper:
"For instance, when the treatment probability is 0.1, for all ranges of the outcome probabilities and even with sample sizes greater than 10,000 observations, the confidence intervals of the IV estimate remain too large for any meaningful hypothesis testing; in contrast, BP confidence intervals are much smaller. Therefore researchers should expect IV and BP coefficients to differ substantially when treatment probabilities are low or when sample sizes are below 5000; linear IV estimates are particularly uninformative for hypothesis testing when treatment probabilities are low."
In this case, you may want to work with the root mean square error, and there is going to be some trade-off between precision and potential bias due to the mis-specification of the error terms.
I think the take-away for us at least was that you want to use a linear IV when the structure of the parameters allows you to do so, but you don't necessarily want to throw away parameteric assumptions on the error structure for all problems; particuarly in cases such as the ones highlighted above where the coverage rate of the linear IV estimator does not permit meaningful hypothesis tests.
In other words, probit AFTER you probe it.....
I definitely need to read your paper closely... you should blog it here!! You already have a title...