loading page

The arXiv of the future will not look like the arXiv
  • bowerbird intelligentleman
bowerbird intelligentleman

Corresponding Author:[email protected]

Author Profile

Abstract

The arXiv is the most popular preprint repository in the world. Since its inception in 1991, the arXiv has allowed researchers to freely share publication-ready articles prior to formal peer review. The growth and the popularity of the arXiv emerged as a result of new technologies that made document creation and dissemination easy, and cultural practices where collaboration and data sharing were dominant. The arXiv represents a unique place in the history of research communication and the Web itself, however it has arguably changed very little since its creation.  Here we look at the strengths and weaknesses of arXiv in an effort to identify what possible improvements can be made based on new technologies not previously available. Based on this, we argue that a modern arXiv might in fact not look at all like the arXiv of today.

Introduction

The arXiv, pronounced "archive", is the most popular preprint repository in the world.  Started in 1991 by physicist Paul Ginsparg, the arXiv allows researchers to freely share publication-ready articles prior to formal peer review and publication. Today, the arXiv publishes over 10,000 articles each month from high-energy physics, computer science, quantitative biology, statistics, quantitative finance, and others (see Fig \ref{104668}). The early success of arXiv stems from the introduction of new technological advances paired to a well-developed culture of collaboration and sharing. Indeed, before the arXiv even existed, physicists were already physically sharing recently finished manuscripts via mail, first, and by email, later.  To understand the success of the arXiv it is important to understand the history of the arXiv. Below we highlight a brief history of technology, services, and cultural norms that predate the arXiv and were integral to its early and continued success.  

The history of the arXiv

Prior to the arXiv, "the photocopy machine was a prime component of the distribution system" \cite{2011arXiv1108.2700G} and  preprints were only exchanged to personal contacts and/or mailing lists  \cite{Elizalde_2017}. Institutional repositories, such as the SPIRES-HEP database (Stanford Physics Information REtrieval System- High Energy Physics) at the Stanford Linear Accelerator Center (SLAC) and the Document Server at CERN only acted as bibliographic services, helping scientists to keep track of publication information. But while SPIRES greatly improved the flow of metadata, it was still hard to retrieve the full manuscript. A new typesetting system would soon emerge and change this.
TeX, pronounced "tech", was developed by Donald Knuth in the late 70's as a way for researchers to write and typeset articles programmatically. Soon after the introduction of TeX, Leslie Lamport set a standard for TeX formatting, called LaTeX, which made it very easy for all researchers to professionally typeset their documents on their own.  This system made sharing papers easier and cheaper than ever before. Indeed, many, if not most, researchers at the time relied upon secretaries or typists to write their work, which then had to be photocopied in order to be sent via mail to a handful of other researchers. Tex allowed researchers to write their documents in a light-weight format (text) that could be emailed and then downloaded and compiled without the need for physical mail.
Researchers began to exchange emails containing preprints, quickly hitting their strict disk space allocation limits \cite{Ginsparg_2011}. To address this problem, an automated email server initially called xxx.lanl.gov was set up 14th August 1991. This service would allow researchers to automatically request preprints via email as needed. It would soon become one of the world's first web servers and, renamed arXiv in 1998,  today still serves as one of the most open and efficient forms of research communication in the world.  
The arXiv was a leader in utilizing new technology when it was launched, however it has arguably changed very little since its inception, despite a wealth of new technologies now available. Here we look at the strengths and weaknesses of the arXiv in an effort to identify what possible improvements can be made based on new technologies and tools and propose that a modern arXiv might in fact not look at all like the arXiv of today --- a development that will likely occur with or without arXiv.