Alberto Pepe deleted file In_order_to_detect.tex  about 11 years ago

Commit id: dd8d3e614d4b58bd9a6b8293f5c0ae342285957a

deletions | additions      

         

In order to detect these four types of Twitter mentions, we first expand all shortened URLs in our crawled public tweets. We select the top 16 popular URL shortening services, including bit.ly, tinyurl.com, and ow.ly, and expand the shortened URLs in our collection of tweets using their respective APIs. As such, we resolved 98,377,880 short URLs, which were mostly generated by the following URL shorteners: bit.ly (61.3\%), t.co (15.2\%), fb.me (6.5\%), tinyurl.com (6.1\%) and ow.ly (4.4\%). (We acknowledge that this procedure will not identify all Twitter mentions of a given arXiv.org paper, but it will however capture most.) From the resulting set, we retain all tweets that contain the term `arXiv' and at least one URL. Next, we associate tweets to arXiv papers by extracting the arXiv ID (substrings matching `dddd.dddd') from any papers mentioned in those tweets. (Note that in the case of the third and fourth type of Twitter mention the arXiv paper ID is not explicitly shown in the tweet itself, but needs to be extracted from the web pages that the tweet in question links to.)