Global Economic Prospects 2015

The global economy is still struggling to gain momentum. Read more ...

Development Prospects

Providing information, analysis, and advice on global trends in the world economy.  Find out more ...

Global Economic Monitor

Now free. Daily and monthly updates on global economic developments and relevant topical issues. Find out more ...

Syndicate content

How Significant is a 1 Percent Difference in Growth Under Debt?

Jamus Lim's picture

By now, the econoblogosphere has heavily skewered the Reinhart-Rogoff debt and growth study, following Herdon, Ash, and Pollin's devastating critique of their methodology. Reinhart and Rogoff have themselves responded to the firestorm of criticism, and to their credit, have openly admitted that there was indeed an error in their computations. It's also worth laying aside the fundamental causality issues, since this has been repeatedly raised by others as a major problem, and is essentially something that the authors have stressed themselves (although in their policy dialog they have been far bolder in drawing prescriptive implications based on the correlation).

In any case, one of the more robust defenses of their results does deserve some closer scrutiny. In particular, Reinhart and Rogoff have argued that the revised results, along with the UMass trio's own, continue to point to a substantially lower average growth rates among economies that exceed the 90 percent debt/GDP ratio: in their own words, "[i]t is utterly misleading to speak of a 1% growth differential that lasts 10-25 years as small."

But this again misrepresents the strength of their results. This is because we can only confidently speak of the one percent difference as being significant if, indeed, the difference is statistically significant. And, at least at the fairly standard threshold of 95 percent confidence, there is reason to believe that any such differences may be yet another chimera. While access to Reinhart and Rogoff's original data is elusive---making it impossible to definitively verify the veracity of this claim---the widely-circulated Excel snapshot provides us with some data to work with (with the added plus that standard errors of the mean calculated using this approach also use the average-of-averages weighting scheme they employ in their paper). And attempting to replicate the key figure shows that, while the means for the observations in the greater-than-90-percent bin are it is certainly lower than the other bins (see figure below), the confidence intervals of all three bins above the 30 percent debt/GDP threshold also substantially overlap. On this (admittedly crude) basis, then, any claim that a 1 percent growth differential over a decade compounds is simply overstating the case made by the data.

Source: World Bank staff calculations, from Reinhart-Rogoff data fragment.
Note: Means for each bin are simple averages of all by-country observations within each respective bin, weighted equally by country. Observations for the >90% bin include updated data for New Zealand for all years, averaging 2.58%. 95 percent confidence intervals computed from standard errors for available observations within each mean.

Postscript: As a point of clarification, it should be noted that confidence intervals can overlap and still show a significant difference between means; see, in particular, this paper (PDF). Confidence interval bars can in fact overlap even at conventional significance levels (of, say, p = 0.05), although not substantially. For the purposes of eyeballing comparisons, perhaps standard error bars are more informative, with the broad rule of thumb that standard error ranges should be separated by about half the width of the bars before differences are significant (so that overlaps would certainly indicate the absence of significance). And as can be seen in this more stringent case (see figure below), there is still overlap between the 30--60/60--90% bin, and between the 30--60/>90% bin.

Source: World Bank staff calculations, from Reinhart-Rogoff data fragment.
Note: Means for each bin are simple averages of all by-country observations within each respective bin, weighted equally by country. Observations for the >90% bin include updated data for New Zealand for all years, averaging 2.58%. Standard errors computed for all available observations within each mean.



Submitted by Anonymous on
Maybe you guys are misssing some point... [EDITORIAL NOTE: THIS COMMENT WAS ORIGINALLY POSTED ANONYMOUSLY IN PORTUGUESE, AND REPLICATED IN ENGLISH HERE TO EASE DICUSSION.] Teach the World Bank statistics. Cult of statistical significance or IV. (Teach the World Bank statistics. Cult of statistical significance or IV.) by Carlos Cinelli Consider two random samples with 10 observations, drawn from a normal distribution with different means and the same variance unknown. To use a concrete example, R simulates the two samples, one from a normal with mean 5 and standard deviation 1 and the other a normal with mean 2 and standard deviation 1. The samples resulted in the following statistics: *** Sample 1 Sample mean: 5 Sample standard deviation: 0.8 The 95% confidence interval: 3.4 to 6.6 *** Sample 2 Sample mean: 2.6 Sample standard deviation: 0.7 The 95% confidence interval: 2.6 to 4.0 *** Note that the confidence intervals intersect. The lower limit of sample 1 is 3.4 and the upper limit of sample 2 is 4.0. This means that the difference between the sample means is not statistically significant at 5%? Not by far! Making a t test for the difference between the two means you get a statistically significant result. Even if you did not know that the variances were equal, the Welch t test gives us a range of 95% confidence interval for the difference between the averages between 1.8 and 3.2. Now imagine that these data were GDP growth, ie, a group has sample average growth of 5% and others 2.6%. If you compare the confidence intervals, you might tend to say that the two groups have no growth "different" ... when, in fact, a proper test classic differences between means indicates a difference between 1.8 and 3.2 points percentage, very important in terms of economic growth! But this error happens? Yes, the World Bank. On Econbrowser on the controversy Heinhart and Rogoff, Chinn released this graph relating the average growth and percentage of public debt to GDP ratio. The bars are the mean and the black line represents the 95% confidence level. debtgdpgrowth.pngNote that although the average growth of countries with high debt (over 90% of GDP) to be much lower than the average of the other, the confidence intervals intersect. This led the staff blog of the World Bank said that "[...] the confidence intervals of all three bins above the 30 percent debt / GDP threshold Also overlap. On this (admittedly crude) basis, then, any claim que a 1 percent growth differential over a decade compounds is simply overstating the case made by the date. " This is wrong, the simple fact of the ranges of 95% confidence cross does not mean anything, even if you thought that statistical significance was the relevant point here. As we saw in the previous example, with a super simple example, the confidence intervals can intersect and yet the difference is "statistically significant" and, more importantly, economically relevant! Aware of the error, the authors did a PS warning for apparel and reducing the confidence interval for standard deviation, instead of two ... Despite the play's title, it was not a "stupidity" of the World Bank. One problem I have encountered when discussing this is that, in general, people think that only "dumb" make this kind of mistake, or just "journals" crappy publish things like this. Ledo mistake ... misunderstanding about confidence intervals, statistical significance, p-values ​​are pervasive in the social sciences, including applied work in the best journals and with the best researchers. Be in the USA, Brazil or Germany, this happens a lot and it's something we have to change. source: source:

Submitted by Jamus Lim on
I agree with your comment that overlapping error bars generated from confidence intervals do not, in and of themselves, necessarily imply statistically insignificant differences. This is precisely why, in the postscript, I had provided the same figure with error bars generated from standard errors, of which overlap definitively indicates the absence of statistical significance. And as clarified in the postscript, there is no statistically significant difference in the 30/60 and 60/90 bins, and between the 60/90 and >90 bins, which can be formally verified with two-tailed t-tests (which for the record yield T = 0.83, p = 0.25 and 1.18, p = 0.25, respectively). This is the point of the post: that R-R have overstated their case of how >90 percent debt/GDP ratio actually represents some kind of (statistically significant) threshold (based on their data). Of course, there is always a tradeoff between clarity and ease of interpretation in conveying statistical results in the form of a graph, and I regret that I was not clearer in my original post.

Hi, Jamus These are not adequate comparisons. See, first, the pure "statistical significance" of the differences is not the important thing here, this is an estimation problem, not a test problem. But, even if statistical significance were the most important thing, both comparisons are wrong. For example, do you think that growth rates and debts between countries are independent? If they aren't, then the relationship between individual CI's and differences CI's can be very odd, not only in the way mentioned in the paper you have linked. So, all these "eyeballing" tests are misleading. Best Regards! Carlos

Submitted by Jamus Lim on
Hi Carlos, Thanks for your comment. As I understand it, you believe that the comparison made in the post is incorrect, because the relationship between growth and debt is endogenous. This is entirely true, and why in the first paragraph of the post, I pointed out that the causality problem, which poses a problem for inference. But since the non-independence of growth and debt (as you call it) is an issue that has already been repeatedly discussed (even by Reinhart and Rogoff themselves), I wanted to focus on an alternative aspect of the R-R paper, that of statistical inference. My point of the post, which hopefully came across, is that the difference in means---a fairly standard statistical test---turns out not to be statistically significant between the 30/60 and 60/90 bins, nor between the 60/90 and >90 bins. The graph was chosen to replicate the graph in R-R's original paper, with the data edits mentioned in the footnote to the table. So if the comparison rubs you the wrong way, it is an issue with how R-R make their comparisons (and inferences).

Add new comment