Domain-level descriptive statistics

Some descriptive statistics about the datasets analyzed in this article are presented in Figure 4. The first row of plots in Figure 4 displays the arXiv subject domains of (a) downloaded, and (b) Twitter mentioned papers (by percentage). A full list of the subject domain abbreviations used in these plots is available in the Materials section, Table 2. We observe a broad and evenly spread distribution of subject domains for downloads and mentions: most papers downloaded and mentioned on Twitter relate to Physics, in particular Astrophysics, High Energy Physics, and Mathematics. The second row of plots in Figure 4 displays the temporal distributions of (c) downloads, and (d) Twitter mentions (the dotted line in both figures is obtained by fitting a 3rd order polynomial function for smoothing). As shown in Figure 4(c), download counts of articles increase over time. This may be partly caused by a cumulative effect: papers that were published earlier have had more time to accumulate reads than papers that were published later. Figure 4(d), however, shows that the total number of tweets that mention arXiv papers decreases over time.

In order to better understand how Twitter mentions vary across domain, we show the Complementary Cumulative Distribution Functions (CCDF) of Twitter mentions for all articles in the five most frequently observed subjects domains of Figure 5. We find that within each domain few papers receive relatively many mentions whereas the majority receive very few. The frequency-rank distribution is thus strongly skewed towards low values indicating that most articles receive very few Twitter mentions. Note that we rely on the so-called Twitter Gardenhose, a random sample of about 10% of all daily tweets, and may thus underestimate the absolute number of Twitter mentions by a factor of 10. (Refer to Materials section for more details).