A crowd-sourced checklist of the top 10 little things that drive us crazy with regression output


This page in:

I’ve had a couple of colleagues with new research assistants mention some of the common issues they were seeing with output they were receiving, with one suggesting we need a “grumpy economists” checklist of things we frequently find ourselves asking or complaining about. Of course this isn’t specific to research assistants – many of the same issues come up when working with co-authors, refereeing papers, or even getting back to output your past-self produced. So, in the interests of being constructive and saving us all a little time going forward, I asked my colleagues in the World Bank’s research group and DIME group and crowd-sourced the following common issues/checklist:

1.       Why is the sample size changing across columns? There are two variants of this.

a.       When the sample size changes when more control variables are added, and then we are not sure whether changes in the coefficients come from adding the controls, or from changing the sample.  Checklist: do some of the controls have missing values? Instead use a dummy variable for “missing control” and then replace the missing observation with the control median (but be very clear about this in your notes lest someone starts interpreting this control coefficient).

b.       When the sample size is very different from one outcome than for the other outcomes. Was this outcome collected using a different source? Is attrition in this outcome correlated with treatment or otherwise likely to cause bias? Is there a coding problem where there was a skip pattern before this question was answered? Checklist: check why the sample size is so different, check if missingness is correlated with key regressors or treatment, and have an explanatory note to accompany the output which explains this.

2.       What does the magnitude of that coefficient mean? Checklist: show the sample mean, and make clear the scale. This is particularly important if you are using index measures (are they scaled in standard deviations or something else), or an outcome that has a scale that is e.g. 1-25. This is also why if you estimate probit or logit models you should show marginal effects, not the raw coefficients. If you are showing interaction effects, it is sometimes useful to provide sample means for the different subgroups too. And make sure we can see what the change is in absolute terms too if possible (how many percentage points or dollars), not just a large percentage change on a small base. Finally, think about how the scale of your variables and how many decimals makes sense to show for the coefficient – think about rounding and don’t report artificial precision (e.g. don’t report a coefficient of 14.957 with a standard error of 5.632, or even worse, a significant effect of 0.000***).

3.       Dummy variables should be coded 0-1: many of us have seen variables like gender included, with a mean of 1.4. Checklist: make sure the dummy variables are coded as 0 and 1, and give them a name (e.g. female) that corresponds to the 1 category.

4.       Are those two coefficients statistically different? Several versions of this are common:

a.       Comparing coefficients in two different columns/regressions, and saying they are different, perhaps one is statistically significant and the other is not, or just they differ in magnitude. A particular bugbear I have here is when people estimate the treatment in column 1 for the sample of women, find a significant effect, estimate in column 2 for the sample of men, find an insignificant effect, and conclude the program worked for women and not men.

b.       Comparing coefficients on multiple treatments in the same regression, and concluding one worked better than another based on magnitudes or on one being significant and not the other.

Checklist: include p-values for testing equality of effects.

5.       What the heck is s2q1 or expert_know? We should be able to read tables without having to look back at questionnaires or the text to understand what variables mean. Checklist: give variables understandable names, and in the notes to tables, explain what variables are to make the tables as self-contained as possible. If the outcome is an index or composite outcome, explain in the notes what goes into it.

6.       Does that coefficient make sense to interpret given all the fixed effects in there? When there are lots of fixed effects and a variable collinear with some of those fixed effects, then Stata will arbitrarily drop a fixed effect, but then the individual variable may not be interpretable, and you should suppress showing it in the table. One example is showing the constant in a regression with lots of fixed effects – it seldom shows anything useful. Another example a colleague gave is including a household variable in a regression that also has household fixed effects. Checklist: don’t have tables show variables that aren’t easily interpretable given the fixed effects.

7.       What have you done with those standard errors? It should be clear whether standard errors are clustered or otherwise adjusted. Checklist: make clear how standard errors are calculated, and whether they differ from one column to the next for some reason.

8.       Is an increase in this outcome good or bad, and are components of an index all going in the same direction? This comes up often with measures like mental health or empowerment, or measures based on aggregating a bunch of Likert-scale questions together. Checklist: make clear (either in the column heading or notes to the table) whether higher values of the outcome are good or bad (e.g. does a higher mental health index mean more or less depression?). When constructed an index, before aggregating, make sure that a higher score means the same thing for all components -  often some items will be reverse-coded to make sure respondents are paying attention, so you need to recode before aggregating.

9.       Is that main effect significant for the subgroup you’ve included an interaction for? In a regression like Y = a + b*Treat + c*Male*Treat + d*Male + e, b will show whether the treatment had a significant effect for women, c whether the effect for males was different from that for females, but we would also like to know whether the effect was significant for males. Checklist: add a p-value for the test b+c=0.

10.   Is that a conditional or unconditional outcome? Weighted or unweighted? Winsorized, truncated, logged, inverse hyperbolic sined, or something else? Any what did you do about zeros? This is where the notes to the table can be important, as well as column headings. And for exploratory work/robustness, it is often good to show several choices here so we can see how sensitive results are to these choices. Checklist: again, the notes to the table should make it clear what is going on without having to read anything else. So use these notes to make clear how outcomes are being transformed and any key issues.


Thanks to all the colleagues who provided these suggestions, and please share in the comments any other key pet peeves you frequently experience.


David McKenzie

Lead Economist, Development Research Group, World Bank

Join the Conversation

Yi Ning
October 03, 2022

Thank you, I think I've encountered almost all of these issues!
Some other (personal) experiences:
- combining separated data with different respondents (eg., asking all household members, or just household heads?):
- different levels of weights (eg., region vs country vs household weights vs population weights)
- if sampling is representative at disaggregated levels (eg., subnational, or by group) checklist: see data coverage

Jason Kerwin
October 04, 2022

I’m gonna throw in a vote for simply hiding all the controls in your regression table. If they aren’t variables of direct interest then they typically can’t be interpreted as consistent estimates of causal effects. Just don’t show them.