Publication cycle: A study of the Public Library of Science (PLOS)


Publications are the driving force in current age academia. However, publishing is a tedious process and can take a considerable amount of time. Previous research has barely investigated whether parts of the publication cycle (i.e., review and production process) can be predicted based on metadata available for all research papers. The predictive value of metadata was investigated in this study with three predictors: (i) the number of authors, (ii) the length of the manuscript, and (iii) the presence of competing interests. Additionally, these models inspect changes in the publication cycle throughout the years. Model results indicate that the review and production times cannot be predicted by the included metadata of research papers. Results also indicate review times have doubled throughout the last decade for PLoS journals, which are currently estimated between 150-250 days on average. Production times, however, have remained highly stable throughout the last decade around an estimated mean 50 days. The results of these analyses indicate that review- and production times cannot be predicted by metadata, given a certain year-specific mean.

Keywords: publishing, peer-review, plos, metadata.

Science communication is primarily based on publishing research results in research papers. Anecdotally, authors feel that the publication cycle takes too long (Himmelstein 2015). A better understanding of the publication lag could provide solace when feelings of substantial delay occur, where the main question is whether there are predictive factors of time taken from submission to publication. This paper tries to model publication times for the Public Libary of Science (PLoS) journals with metadata available for resesarch papers. The PLoS journals include PLoS Medicine, PLoS Biology, PLoS ONE, PLoS Pathogens, PLoS Genetics, PLoS Computational Biology, PLoS Neglected Tropical Diseases, and PLoS Clinical Trials (which was later merged into PLoS Medicine).

Previous research indicated that statistically nonsignificant results take longer to be published (JA 1998), review times have decreased (Lyman {2013}), and that the amount of figures or tables does not predict publication time (Lee {2013}). Other research into the academic publication cycle has focused on rejection rates of submitted manuscripts or the types of decisions made after the peer-review process (Rosenkrantz 2015). These studies primarily relied on sampling research papers from journals, but with the rise of APIs and scrapers to mine the literature (Smith-Unna 2014) such sampling is becoming redundant. In this paper, I analyze the entire population of PLoS research articles and split between predicting review time (i.e., time from submission through acceptance) and production time (i.e., time from acceptance through publication) in order to investigate whether publication time can be predicted with paper metadata.


Article level data was co