Publication cycle: A study of the Public Library of Science (PLOS)

Abstract

Publications are the driving force in current age academia. However, publishing is a tedious process and can take a considerable amount of time. Previous research has barely investigated whether parts of the publication cycle (i.e., review and production process) can be predicted based on metadata available for all research papers. The predictive value of metadata was investigated in this study with three predictors: (i) the number of authors, (ii) the length of the manuscript, and (iii) the presence of competing interests. Additionally, these models inspect changes in the publication cycle throughout the years. Model results indicate that the review and production times cannot be predicted by the included metadata of research papers. Results also indicate review times have doubled throughout the last decade for PLoS journals, which are currently estimated between 150-250 days on average. Production times, however, have remained highly stable throughout the last decade around an estimated mean 50 days. The results of these analyses indicate that review- and production times cannot be predicted by metadata, given a certain year-specific mean.

Science communication is primarily based on publishing research results in research papers. Anecdotally, authors feel that the publication cycle takes too long (Himmelstein 2015). A better understanding of the publication lag could provide solace when feelings of substantial delay occur, where the main question is whether there are predictive factors of time taken from submission to publication. This paper tries to model publication times for the Public Libary of Science (PLoS) journals with metadata available for resesarch papers. The PLoS journals include PLoS Medicine, PLoS Biology, PLoS ONE, PLoS Pathogens, PLoS Genetics, PLoS Computational Biology, PLoS Neglected Tropical Diseases, and PLoS Clinical Trials (which was later merged into PLoS Medicine).

Previous research indicated that statistically nonsignificant results take longer to be published (JA 1998), review times have decreased (Lyman {2013}), and that the amount of figures or tables does not predict publication time (Lee {2013}). Other research into the academic publication cycle has focused on rejection rates of submitted manuscripts or the types of decisions made after the peer-review process (Rosenkrantz 2015). These studies primarily relied on sampling research papers from journals, but with the rise of APIs and scrapers to mine the literature (Smith-Unna 2014) such sampling is becoming redundant. In this paper, I analyze the entire population of PLoS research articles and split between predicting review time (i.e., time from submission through acceptance) and production time (i.e., time from acceptance through publication) in order to investigate whether publication time can be predicted with paper metadata.

Method

Article level data was collected from all PLoS journal research papers with v0.5 of the aureplacedverbatimaa package (Chamberlain 2015) in aureplacedverbatimaaa v3.2.0 (Team 2015). The dataset was collected on July 4, 2015 and is available via aureplacedverbatimaaaa . Research papers without the following were excluded: journal name, publication dates (i.e., submitted, accepted, and published), and problematic publication dates. Problematic publication dates include being published before accepted, accepted before submitted, or accepted at the same time as submitted.

The full publication cycle was split into the review process and the production process. The full publication cycle is the number of days between submission and publication, whereas the review process is the number of days between submission and acceptance; the production process is the number of days between acceptance and publication. The number of days for each element of the publication cycle was modeled with a Poisson regression model. A Poisson regression model is a linear regression model for count variables and assumes equal mean and variance (i.e., dispersion $$=1$$). The data showed overdispersion (i.e., dispersion $$>1$$) and quasi-likelihood estimation was used to correct for the violated dispersion assumption.

Model predictors were year of publication, presence of competing interests, number of pages, and number of authors. The reasoning behind these predictors was as follows. Competing interests could increase publication time when disputed by editors and authors are subsequently asked to explain. Number of pages could increase publication time due to longer reviews in both time taken to complete review, the length of the review, and increased production efforts required. Number of authors could influence the time it takes for authors to reach consensus on the response letter and potential other edits during the publication process. Squared predictors were included for number of pages and number of authors due to non-linear relations in scatterplots with review- and production days. Additionally, the number of authors and the number of pages were mean centred to provide meaningful intercept estimates.

Considering that the data are the population of data for PLoS research papers, statistical inference testing is not applied. Moreover, note that PLoS Clinical Trials was merged into PLoS Medicine in 2007 and only started in 2006, which is why other years are not included in estimates for this journal.

Descriptive results

The collected dataset includes information on 140,674 research papers. Across all journals, the median publication cycle is 152 days, with the majority of this being the review process (i.e., median 111 days) and not the production process (i.e., median 38 days). Table \ref{tab:tab1} specifies these numbers per journal and indicates PLoS ONE has the fastest review process, whereas PLoS Medicine has the longest review process (median difference = 69). PLoS Clinical Trials had the longest production process, compared to PLoS ONE (median difference = 16). S1 Figure includes plots of observed median review- and production times per journal.

\label{tab:tab1}

 # Articles Publication time Review time Production time ONE 122,398 147 107 36 Clinical Trials 44 180.5 125 52 Genetics 4,741 182 131 50 Neglected Tropical Diseases 2,999 183 133 45 Pathogens 3,992 183 139.5 43 Biology 2,015 190 141 46 Computational Biology