How the Scientific Community Reacts to Newly Submitted Preprints: Article Downloads Twitter Mentions and Citations.

Abstract

This article was published as How the Scientific Community Reacts to Newly Submitted Preprints: Article Downloads, Twitter Mentions and Citations. Xin Shuai, Alberto Pepe, Johan Bollen. PLoS ONE 7(11): e47523. doi:10.1371/journal.pone.0047523. Open Access Article.

Abstract. We analyze the online response to the preprint publication of a cohort of 4,606 scientific articles submitted to the preprint database arXiv.org between October 2010 and May 2011. We study three forms of responses to these preprints: downloads on the arXiv.org site, mentions on the social media site Twitter, and early citations in the scholarly record. We perform two analyses. First, we analyze the delay and time span of article downloads and Twitter mentions following submission, to understand the temporal configuration of these reactions and whether one precedes or follows the other. Second, we run regression and correlation tests to investigate the relationship between Twitter mentions, arXiv downloads and article citations. We find that Twitter mentions and arXiv downloads of scholarly articles follow two distinct temporal patterns of activity, with Twitter mentions having shorter delays and narrower time spans than arXiv downloads. We also find that the volume of Twitter mentions is statistically correlated with arXiv downloads and early citations just months after the publication of a preprint, with a possible bias that favors highly mentioned articles.

Introduction

The view from the “ivory tower” is that scholars make rational, expert decisions on what to publish, what to read and what to cite. In fact, the use of citation statistics to assess scholarly impact is to a large degree premised on the very notion that citation data represent an explicit, objective expression of impact by expert authors (Rubin 2010). Yet, scholarship is increasingly becoming an online process, and social media are becoming an increasingly important part of the online scholarly ecology. As a result, the citation behavior of scholars may be affected by their increasing use of social media. Practices and considerations that go beyond traditional notions of scholarly impact may thus influence what scholars cite.

Recent efforts have investigated the effect of the use of social media environments on scholarly practice. For example, some research has looked at how scientists use the microblogging platform Twitter during conferences by analyzing tweets containing conference hashtags (Letierce 2010, Weller 2011). Other research has explored the ways by which scholars use Twitter and related platforms to cite scientific articles (Priem 2010, Weller 2011a). More recent work has shown that Twitter article mentions predict future citations (Eysenbach 2011). This article falls within, and extends, these lines of research by examining the temporal relations between quantitative measures of readership, Twitter mentions, and subsequent citations for a cohort of scientific preprints.

We study how the scientific community and the public at large respond to a cohort of preprints that were submitted to the arXiv database (http://arxiv.org), a service managed by Cornell University Library, which has become the premier pre-print publishing platform in physics, computer science, astronomy, and related domains. We examine the relations between three types of responses to the submissions of this cohort of pre-prints, namely the number of Twitter posts (tweets) that specifically mention these pre-prints, downloads of these pre-prints from the arXiv.org web site, and the number of early citations that the 70 most Twitter-mentioned preprints in our cohort received after their submission. In each case, we measure total volume of responses, as well as the delay and span of their temporal distribution. We perform a comparative analysis of how these indicators are related to each other, both in magnitude and time.

Our results indicate that download and social media responses follow distinct temporal patterns. Moreover, we observe a statistically significant correlation between social media mentions and download and citation count. These results are highly relevant to recent investigations of scholarly impact based on social media data