Publication cycle: A study of the Public Library of Science (PLOS)

Abstract

Publications are the driving force in current age academia. However, publishing is a tedious process and can take a considerable amount of time. Previous research has barely investigated whether parts of the publication cycle (i.e., review and production process) can be predicted based on metadata available for all research papers. The predictive value of metadata was investigated in this study with three predictors: (i) the number of authors, (ii) the length of the manuscript, and (iii) the presence of competing interests. Additionally, these models inspect changes in the publication cycle throughout the years. Model results indicate that the review and production times cannot be predicted by the included metadata of research papers. Results also indicate review times have doubled throughout the last decade for PLoS journals, which are currently estimated between 150-250 days on average. Production times, however, have remained highly stable throughout the last decade around an estimated mean 50 days. The results of these analyses indicate that review- and production times cannot be predicted by metadata, given a certain year-specific mean.

Keywords: publishing, peer-review, plos, metadata.

Science communication is primarily based on publishing research results in research papers. Anecdotally, authors feel that the publication cycle takes too long (Himmelstein 2015). A better understanding of the publication lag could provide solace when feelings of substantial delay occur, where the main question is whether there are predictive factors of time taken from submission to publication. This paper tries to model publication times for the Public Libary of Science (PLoS) journals with metadata available for resesarch papers. The PLoS journals include PLoS Medicine, PLoS Biology, PLoS ONE, PLoS Pathogens, PLoS Genetics, PLoS Computational Biology, PLoS Neglected Tropical Diseases, and PLoS Clinical Trials (which was later merged into PLoS Medicine).

Previous research indicated that statistically nonsignificant results take longer to be published (JA 1998), review times have decreased (Lyman {2013}), and that the amount of figures or tables does not predict publication time (Lee {2013}). Other research into the academic publication cycle has focused on rejection rates of submitted manuscripts or the types of decisions made after the peer-review process (Rosenkrantz 2015). These studies primarily relied on sampling research papers from journals, but with the rise of APIs and scrapers to mine the literature (Smith-Unna 2014) such sampling is becoming redundant. In this paper, I analyze the entire population of PLoS research articles and split between predicting review time (i.e., time from submission through acceptance) and production time (i.e., time from acceptance through publication) in order to investigate whether publication time can be predicted with paper metadata.

Method

Article level data was collected from all PLoS journal research papers with v0.5 of the aureplacedverbatimaa package (Chamberlain 2015) in aureplacedverbatimaaa v3.2.0 (Team 2015). The dataset was collected on July 4, 2015 and is available via aureplacedverbatimaaaa . Research papers without the following were excluded: journal name, publication dates (i.e., submitted, accepted, and published), and problematic publication dates. Problematic publication dates include being published before accepted, accepted before submitted, or accepted at the same time as submitted.

The full publication cycle was split into the review process and the production process. The full publication cycle is the number of days between submission and publication, whereas the review process is the number of days between submission and acceptance; the production process is the number of days between acceptance and publication. The number of days for each element of the publication cycle was modeled with a Poisson regression model. A Poisson regression model is a linear regression model for count variables and assumes equal mean and variance (i.e., dispersion \(=1\)). The data showed overdispersion (i.e., dispersion \(>1\)) and quasi-likelihood estimation was used to correct for the violated dispersion assumption.

Model predictors were year of publication, presence of competing interests, number of pages, and number of authors. The reasoning behind these predictors was as follows. Competing interests could increase publication time when disputed by editors and authors are subsequently asked to explain. Number of pages could increase publication time due to longer reviews in both time taken to complete review, the length of the review, and increased production efforts required. Number of authors could influence the time it takes for authors to reach consensus on the response letter and potential other edits during the publication process. Squared predictors were included for number of pages and number of authors due to non-linear relations in scatterplots with review- and production days. Additionally, the number of authors and the number of pages were mean centred to provide meaningful intercept estimates.

Considering that the data are the population of data for PLoS research papers, statistical inference testing is not applied. Moreover, note that PLoS Clinical Trials was merged into PLoS Medicine in 2007 and only started in 2006, which is why other years are not included in estimates for this journal.

Descriptive results

The collected dataset includes information on 140,674 research papers. Across all journals, the median publication cycle is 152 days, with the majority of this being the review process (i.e., median 111 days) and not the production process (i.e., median 38 days). Table \ref{tab:tab1} specifies these numbers per journal and indicates PLoS ONE has the fastest review process, whereas PLoS Medicine has the longest review process (median difference = 69). PLoS Clinical Trials had the longest production process, compared to PLoS ONE (median difference = 16). S1 Figure includes plots of observed median review- and production times per journal.

\label{tab:tab1}

Table 1. Descriptive statistics per journal, with publication-, review-, and production time in median.
# Articles Publication time Review time Production time
ONE 122,398 147 107 36
Clinical Trials 44 180.5 125 52
Genetics 4,741 182 131 50
Neglected Tropical Diseases 2,999 183 133 45
Pathogens 3,992 183 139.5 43
Biology 2,015 190 141 46
Computational Biology 3,423 199 148 48
Medicine 1,062 230.5 176 47
Overall 140,674 152 111 38

These differences in the review- and production speed could be a consequence of increased efficiency or stricter publication criteria. PLoS ONE contains 122,398 papers and is considered a megajournal (i.e., not field specific or selective in topic). On the other hand, the other journals are more similar to traditional journals in their criteria for publication (e.g., originality of research). PLoS Medicine, for example, contains ’only’ 1,062 papers, indicating a large disparity with PLoS ONE.

Correlations indicate that the total publication cycle is almost perfectly correlated with review time (\(\rho=.976\)). This indicates that \(95\%\) of the variance in publication cycle is explained by the review time and that the production process seems an additive random process that is not predicted by the time taken to get a paper accepted.

Aggregate model results

Poisson model estimates for all journals together indicate that both review- and production time are only predicted by year. Coefficients in Table \ref{tab:tab2} indicate negligible predictive effects of number of authors, number of pages, and presence of competing interests (i.e., \(b \leq |.017|\)). Dummy coefficients indicate that review time has increased, whereas production time has fluctuated around 50 days. Besides the effect of year, the results indicate review time is a random process.

\label{tab:tab2}

Table 2. Poisson regression model estimates for review- and production time.
Estimate (review) Estimate (production)
Intercept 4.18370 4.24677
Authors (centred) 0.00176 0.00582
Authors\(^2\) (centred) -0.00001 -0.00001
Pages (centred) -0.00084 0.00012
Pages\(^2\) (centred) -0.00010 -0.00011
Conflict of interest -0.01713 0.00551
2004 0.68758 0.10155
2005 0.74031 -0.12891
2006 0.69579 -0.10830
2007 0.55104 -0.45996
2008 0.62225 -0.56911
2009 0.59525 -0.56514
2010 0.66045 -0.66266
2011 0.65463 -0.56665
2012 0.73687 -0.47161
2013 0.73887 -0.36991
2014 0.77532 -0.53661
2015 0.84229 -0.29643

The estimated mean review- and production time are depicted in Figure 1. For review time, the estimates are increasing in a non-linear fashion, with a short decreasing trend 2006 and 2008. The estimated mean review time has climbed to approximately 150 days since 2003. Estimated mean production time fluctuates around 50 days. The journal specific model results are described next.

Fig. 1. Mean estimated review- (top) and production (bottom) time in days across all PLoS journals, including loess curves.

Journal model results

When the results are specified per journal, model estimates are similar to the aggregate results described previously. Most journal specific models included no meaningful effect for number of authors, number of pages, or presence of competing interests on either the review- or production time. Only for PLoS Clinical Trials and PLoS Biology the presence of competing interests had a noteworthy effect on review- and production time (\(b=.112\) and \(b=.106\), respectively). This indicates that competing interests increase review- and production time by a factor of approximately 1.1 for Clinical Trials and PLoS Biology. All individual coefficients per journal for both review- and production time are available in S2 File. Figure 2 plots the mean estimated review- and production times for each journal.

Fig. 2. Mean estimated review- (top) and production (bottom) time in days per PLoS journals, including loess curves (top) and regression lines (bottom).

Substantial variability is observed in estimated mean review times across journals, but all journals show an increasing time taken to complete the review process. In accordance with the descriptive statistics given earlier, PLoS Medicine has the longest estimated mean review time, whereas PLoS ONE is the fastest. As of 2015, the review process takes between 150-250 days on average and is less variable across journals than in the preceding years.

The estimated mean production times are highly consistent across journals and show less fluctuation than the aggregate results. The estimated mean review time is approximately 50 days across journals, across years.

Discussion

The results of this population level investigation of the PLoS publication cycle indicates that review times have doubled to 150-250 days in the last decade, production time has remained relatively stable at 50 days, and that the publication cycle is not substantially predicted by article metadata. The lack of predictive value of length of a manuscript, number of authors, or the presence of competing interests indicates that the publication cycle might be more a random- than a structured process.

It is noteworthy that, with the development of new editorial systems, the production times for research papers have remained stable in the last decade. Only recently, as of January 1 2015, PLoS has introduced a new set of manuscript guidelines to improve automatization of the production process. Note that the results in this paper show no systematic effect of this, or any previous, adjustment to the production process. The current system might provide this effect in the (near) future, but has not yet.

The increase in review time is substantial and begs the question why this review time has doubled. The increase in review times could be due to any amount of factors, ranging from increased difficulty of finding reviewers through authors taking longer to reply to reviewer comments. That review times are not predicted by the included metadata, however, eliminates these properties of papers as explanatory factors for increased review times. If, for example, the length of the manuscript increased throughout the decade and this explained the increased review time, the effect of year would disappear after controlling for manuscript length. This clearly was not the case.

In sum, authors are left guessing how long it takes for their paper to be published, where this paper indicates that the duration of the publication cycle might be random in some sense. More specifically, publication time seems to only be subject to trends throughout the years and not paper specific characteristics. The trends in the number of review days seem particularly strong, where the doubling of the review time is concerning.

References

  1. Daniel Himmelstein. Publication delays at PLOS and 3,475 other journals. In Satoshi Village. (2015). Link

  2. Ioannidis JA. EFfect of the statistical significance of results on the time to completion and publication of randomized efficacy trials. JAMA 279, 281-286 (1998). Link

  3. R. Lee Lyman. A Three-Decade History of the Duration of Peer Review. JOURNAL OF SCHOLARLY PUBLISHING 44, 211-220 (2013). Link

  4. David T. Lee, Jason LaCombe, Christina K. Chung, Kattan Ahlia, Gordon K. Lee. Lag-Time to Publication in Plastic Surgery Potential Impact on the Timely Practice of Evidence-Based Medicine. ANNALS OF PLASTIC SURGERY 71, 410-414 (2013). Link

  5. Andrew B Rosenkrantz, Mukesh Harisinghani. Metrics for Original Research Articles in the AJR: From First Submission to Final Publication. AJR. American journal of roentgenology 204, 1152–1156 (2015). Link

  6. Richard Smith-Unna, Peter Murray-Rust. The ContentMine Scraping Stack: Literature-scale Content Mining with Community-maintained Collections of Declarative Scrapers. D-Lib Magazine 20 (2014). Link

  7. Scott Chamberlain, Carl Boettiger, Karthik Ram. rplos: Interface to the Search API for the ’PLoS’ Journals. (2015). Link

  8. R Core Team. R: A Language and Environment for Statistical Computing. (2015). Link

[Someone else is editing this]

You are editing this file