The Impact of Economic Blogs - Part I: Dissemination (aka check out these cool graphs!)
This page in:
There is a proliferation of economics blogs, with increasing numbers of famous and not-so-famous economists devoting a significant amount of time to writing blog entries, and in some cases, attracting large numbers of readers. Yet little is known about the impact of this new medium. Together we are writing a paper to try and measure various impacts of economics blogs and thought we’d share the results over a few blog posts – and hopefully get useful comments to improve the paper at the same time.
Question 1: “Do blogs lead to increased dissemination of research papers?”
We examine this question by using abstract view and download statistics from Research Papers in Economics (RePEc), matched to the dates that blogs link to these papers. A few graphs dramatically illustrate the potential of blogs to draw attention to research papers.
Example 1: Freakonomics blogs about a paper
Example 2: Marginal Revolution blogs about a paper:
Example 3: Chris Blattman blogs about a paper:
Ok, so there seems to be something there. To formally test for the impact of different blogs on abstract views and downloads we put together a database of 94 papers linked to on 6 blogs: Aid Watch (before it ended), Chris Blattman, Economix (New York Times), Marginal Revolution, Freakonomics, and Paul Krugman. We define t=0 in the month in which the blog entry occurred, t=-1 in the month before, t=+1 in the month after, etc. Then we estimate the impact of blog s linking to a paper i via the following regression:
This controls for paper-specific fixed effects, and looks for a spike in views in the month the paper is blogged about, tests whether this continues into the next month, and also includes a lead term to rule out reverse causation whereby a paper gets a lot of downloads for some other reason, leading people to blog about it. For robustness we also include paper-specific linear time trends.
Results can be summarized as follows:
· Blogging about a paper causes a large increase in the number of abstract views and downloads in the same month: an average impact of an extra 70-95 abstract views in the case of Aid Watch and Blattman, 135 for Economix, 300 for Marginal Revolution, and 450-470 for Freakonomics and Krugman. [see regression table here]
· These increases are massive compared to the typical abstract views and downloads these papers get- one blog post in Freakonomics is equivalent to 3 years of abstract views! However, only a minority of readers click through – we estimate 1-2% of readers of the more popular blogs click on the links to view the abstracts, and 4% on a blog like Chris Blattman that likely has a more specialized (research-focused) readership.
· There is some spillover of reads into the next month (not everyone reads a blog post the day it is produced), and no evidence that abstract views and downloads lead blog posts.
A more formal write-up of this section of the paper and the table of results can be found here. We’d love to hear any comments and will incorporate them as appropriate into the paper we’re writing.
If you want to play around with the data yourself and vet our analysis, here is the Stata data set and do file. Let us know if you think we should be trying other specifications or you can’t replicate our results, etc.
Look out for our next installment Thursday, when we ask if blogs increase influence.
Fantastic idea, execution and writeup. I'm reminded, though, of Berk Ozler's post on here a few weeks ago in which he talked about how the internet had changed matters related to publishing academic papers. In some ways, the information presented here should be considered alongside Berk's points. Traditionally, scientific knowledge was distributed in two ways: through peer review publications and through journals & conferences. The peer review was like a cost discriminating signal whereby a reader could update her beliefs about the paper's content; the journals and conferences spread the information. What you show in this blog post, and what Berk discussed earlier, go hand in hand to suggest that the Internet - primarily through the creation of scientific blogging networks, particularly ones in which a few blogs have considerable reach and readership - has unbundled those two historical functions. We no longer depend on journals and conferences to distribute "scientific knowledge", but rather working paper series like the NBER which are distributed on the internet, and bloggers like MR. But, another question should examine what effect the internet has had on the first role of journals, which is peer review, and which was Berk's point. Big coverage of a working paper does not mean peer review, but it does mean something is being distributed. The positive part of that is interesting, but the normative moreso.
I look forward to the impact study, where I suspect you'll talk about this. Fantastic and provocative work.
David and Berk - Fascinating post, thanks for the insights.
It would be interesting to see if in the medium term that has also an impact on the actual citations in academic journals of those papers (any ideas on how to take into account the "treatment" effect?).
Moreover, based on the examples you reported I noted the different ratios between the number of people that read an abstract vs. number of downloads. A citation in Freakonomics gives roughly one paper downloaded every ten abstracts viewed; Chris Blattman one paper downloaded every five abstracts viewed. I did not run any further analysis on the data, but that may support the idea that they have different audience (Freakonomics having a more generalist audience).
David/Berk -- This is a terrific study and wonderfully presented. I also commend you for sharing the data in stata and the do files, something we at the Center for Global Development have just committed ourselves to do routinely (see the announcement of our new Data and Code Transparency Policy http://blogs.cgdev.org/globaldevelopment/2011/08/cgds-new-data-code-tra… )
Your study confirms the intuition that has guided our Web-based policy engagement around CGD research since I arrived here seven years ago. Indeed, I think CGD was among the first of the think tanks and academic institutions to do this in a systematic manner. Frustration at being unable to try this in the World Bank is one of the key reasons I left, so I'm glad to see that this topic is now the subject of such serious research!
That said, I hope your study or further work will offer some surprises or tips on how those of us exploring the relationship between online engagement, research, and policy change can raise our game. I look forward to the next installment!
PS: Your blog is to be commended for a clean easy to use interface. I especially like the clarity of the information concerning the behavior of the comments field. I hope you will consider adding a Twitter button. I'll Tweet the study, but others who might have done so may find the higher transaction cost of not having a Twitter button just enough reason to refrain.
Thanks for the suggestion on analyzing citations, which is something we've heard from other people this morning as well. The citation analysis is a little harder because the couterfactual is harder to establish without the time trend on citations. Looking at comparable papers in the same issue, having author fixed effects, etc. are some ideas, but they all may end up being more suggestive than conclusive due to possible selection bias. We show that papers trending up don't affect the likelihood of being blogged about, but if the elite bloggers are good at discerning good papers which will be highly cited (along with the people who will do the citing), then the treatment effect will be hard to identify.
Do you know of any sources where we could get citations by month or year, so that one could at least look at papers that have been around for a while before being blogged about and compare the before and after citations?
Hi Lawrence,
Thanks for the comments. On our front page, under every story, there is a share button that includes, Facebook, Twitter, and email. Any suggestions you have to make these even easier or more prominent would be welcome.
Berk.
Hi Scott,
Thanks for the thoughtful comments. Fortunately for your question, there is an app for that. Glenn Ellison has an article titled, "Is peer review in decline?" in the latest issue of Economic Inquiry: http://onlinelibrary.wiley.com/doi/10.1111/j.1465-7295.2010.00261.x/pdf
Our work here has less to say about the role of internet on peer review...
Berk.
Berk and David - I don't know if this is remotely tractable given the heterogeneity in articles by author and subject, but you might be able to do something. But what if you looked also at the impact of blog coverage on publication-relevant outcomes, such as time to publication and the quality of the journal? Say that MR links to a working paper, and you track down when and where the paper was published and compared that to some counterfactual where it had not been blogged about (say the same author's other working papers, groups of papers that appeared at that same time but had not been blogged, or possibly some other effort to account for quality).
For instance, you could imagine a paper that gets massive coverage through blogs and never gets published because now journal editors and referees have a heightened awareness of the author, the potential problems with the paper, and so on. You could imagine another situation where coverage increased information about the paper's strengths and weaknesses, and allowed the paper to improve through revisions and move to an even better journal.
Hi Berk - in Google Scholar after clicking below the paper of interest in "Cited by" you can play with the advance search and get the number of citations per year. Also Web of Knowledge has a similar function.
However, the results are quite different. I tried with Landry et al. paper: Google reported 113 citations, meanwhile Web of Knowledge only 38. I think the latter looks only in papers published in academic journals.
That's a very interesting study, because we don't have yet many data about the impact of this new tecnologies... it was a pleasure to help you collecting this data, I hope we can work more, get better results and make more analisys on that.
1. You can't really identify one blog with the results of search for the article when other blogs at the same time are refering to the article. If Freakonomics and MR both cite, both get credit.
In the advertising world, what you do to measure the effectiveness of a web item is to look for n-1. That is, what was the site the viewer was on immediately before they went to the abstract, in this case.
If you don't do that, you get this problem, which, btw, is made worse when you use monthly rather than daily hits: I could make a very large showing of hits "caused" by my website if I observed that more traffic was going to the abstract in the preceeding week. Now I publish the link on my website--you can't say, in a network situation, what moved the traffic--the preceeding traffic, or my posting. That's why you need n-1 data. And, thats why advertisers don't pay for the posting, but only for the click through.
2. You might want to look at some of the papers at MIT which look at weak links, roles of aggregators, etc. in social networks., and also look at some of the papers published by Google.
3. Finally, look up the new book: Networks, Crowds and Markets by Easley and Kleinberg and you will see the real world methology for this.
Thanks for the comments -- we'll see how easy it would be, if at all, to dig up n-1 data.
On (1), RePEc does not publish daily abstract views and downloads, only monthly. We don't think that these bloggers can see the actual daily views/downloads data to guide the timing of their posts (plus even if they could, I don't think they'd do it -- not much in it for them, unless they were hoping for big ad dollars, which is doubtful. We're discussing mainly scholar bloggers in this paper). It is entirely possible, however, that they would see correlates, such as other bloggers talking about the same paper, and this could be a problem to the extent that we did not code this. However, as explained in our paper, if a paper was blogged by multiple sites, it is coded as such in the data. Finally, we don't see any lead effects, meaning that the previous month's views/downloads have zero predictive power on the likelihood of being blogged about and linked to by one of these bloggers, while there are lag effects, meaning that some of the impact continues into the next month). However, you're right that some unobservable variable, such as a paper going on the seminar circuit, being presented at large conferences, being linked to from a media outlet, or a link from a smaller blog that we missed, could lead to the observation we present.
Still, we actually DO have n-1 data from our own blog for two of the blogs we analyze (MR and Chris Blattman, who linked to a few posts that we put up over the past 4 months) and we can clearly see that they are among our top referrers, with their referrals coming mostly within minutes to 24 hours after they link. Perhaps, we should include some of this as supporting evidence to our monthly event study analysis. I don't think we can really get n-1 data from RePEc.
On (2) and (3), thanks we'll check these out -- Easley was a professor of mine in grad school...
Berk.
As a follow-up on Berk's comments, we've chosen to focus on situations where the concerns you have seem second-order. The issues you mention seem important when looking at newly released papers, where there are likely to be a lot of people downloading the paper and where several blogs may link to it at once. This is much less of an issue for papers that have been out for months or years, where it is the idiosyncratic ideas of a particular blogger that cause them to link back to an older paper.
Second, your example of the advertising world does not seem to me a valid methodology, since it ignores substitution effects. What I mean by this is that a blog might link to a paper you were going to read anyway, so the net impact is zero, even if all the n-1 sites happen to be the blog. The event study analysis we employ does, in principle, get around this issue by saying the counterfactual for an older paper is what usual download behavior is in the months in the run up to and the period shortly after it was blogged.
Berk, If you have the n-1 data for MR and Batemann, you can measure MR. But, I still don't think this addresses the problem of an aggregator v. an initiator: If I make it my habit to search a number of websites that link to a new article, and then link the article to my website, I get credit, not everyone else.
I think what you need to do is identify how many websites referred to the article over time, the traffic generated on the date, and recognize crowds act in a reinforcing way: if I see two people link and discuss, I am more likely to look at it. Think about this as a release of information through a network with the links of the network strong and weak between nodes, with some links reinforcing or amplifying the recommendations of others.
Here is something you might want to do or look for. There is a woman at MIT who did here Ph.D a few years ago on internet wesite linking and traffic, density of networks and measurement. She gave a presentation to the graduate marketing department where I occaisionally serve as an adjunct. I would search MIT articles, Ph.D dissertations to find it. Sorry I can't be more help in identifying her and I would like to keep my privacy. She also looked at what websites link to other websites and what network that creates or identifies. Sites which spanned two different groups (ie, both liberal or conservative, for example), were more powerful than one which engaged one constituency, for example.
Re:
"your example of the advertising world does not seem to me a valid methodology, since it ignores substitution effects. What I mean by this is that a blog might link to a paper you were going to read anyway, so the net impact is zero,"
If that is true--that you would have read the paper anyway--then the net impact of any blog is zero also if you would have read the material anyway.
I always listen attentively when an economists calls something "second order" as a way of dismissing a problem.
As to another area of research, you might want to look at blogs which refer but do not have comment sections, and blogs which have comment sections. People when they engage in discussion do not like looking stupid (although they manage to look stupid anyway), and therefore they are more likely to read the article or abstract. A blog which merely lists or summarizes is in fact no different than a journal like BE Press which pushes the index to you. I think you will find that the number of comments correlates to the number of downloads.
Many fields have a couple of levels of blog activity. Some blogs, e.g., Larry Solum's Legal Theory Blog or the Law Profs blog network, or Dienekes' Anthropology blog, consist predominantly of reviews of scholarly journal articles in a field, often merely cherry picking abstracts with minimal additional comment. Some online media styled as magazines rather than blogs (e.g. Science Daily and Science News) and sometimes having a print version also routinely cite academic journal articles as primary sources and digest the results.
These blogs and media outlets tend to have a fairly small, fairly regular and elite readership including subject area specialist bloggers with larger readerships such as those mentioned in the papers.
So, often, the low profile abstract digesting source will lead to follow up blogging by a higher traffic blog with a more general audience and it is the secondary blog posting, rather than the initial blog posting that drives the traffic to the abstract.
Blog links of any kind to an abstract may also heighten its relevance in search engines.
Great idea, deed and writeup. I'm reminded, though, of Berk Ozler's location on here a few weeks ago in which he talked approx how the internet had changed matters related to publishing academic papers. In some ways, the information presented here should be considered alongside Berk's points. Traditionally, scientific wisdom was distributed in two ways: through friend review publications and through journals & conferences.
http://www.jmir.org/2011/4/e123/ may be of interest (relationship between twitter mentionings and citations).