Alberto Pepe edited regression.tex  about 11 years ago

Commit id: a0dcf31db51f97a4eb452a7db84f1d1e28af12a6

deletions | additions      

       

From Table 3, we observe that publication period $P$ is certainly a non-neglectable factor to predict the citation counts $C$ but also that Twitter mentions $T$ shows equally significant correlations. Moreover, Twitter mentions seem to be the most significant predictor of citations, compared to arXiv downloads and time since publication. This is not the case for arXiv downloads which, when accounting for Twitter mentions and arXiv downloads, do not exhibit a statistically significant relationship to early citations.  In Figure 7 we show the bivariate scatterplots between Twitter mentions, arXiv downloads and citations. The corresponding Pearson's correlation coefficients are shown as well. Figure 7(b) and 7(c) again show that Twitter mentions are correlated with citations better than arXiv downloads, which matches our results obtained from multivariate linear regression analysis. In addition, Twitter mentions are also positively correlated with arXiv downloads as is shown in Figure 7(a), suggesting that the Twitter attention received by an article can be used to estimate its usage data, but usage, in turn, does not seem to correlated to early citations. Given the rather small sample size and the unequally distributed scatter, we performed a delete-1 observation jackknife on the Pearson's correlation coefficient between Twitter mentions and early citations (N=70). This yields a modified correlation value of 0.430 vs. the original value of 0.4516 indicating that the observed correlation is rather robust. However, dropping the top two frequently tweeted articles does reduce the correlation to 0.258 (p=0.016) implying that the observed correlation is strongest when frequently mentioned articles on Twitter are included, matching the results reported by \cite{Eysenbach_2011}. \cite{Eysenbach2011}.