The reproducibility crisis

The rise of computational science has led to unprecedented opportunities in the weather and climate sciences. Ever more powerful computers enable experiments that would have been considered impossible only a decade ago, while new hardware technologies allow data collection in even the most inaccessible places. In order to analyze the vast quantities of data now available to them, modern practitioners –– most of whom are not computational experts –– use an increasingly diverse set of software tools and packages. Today’s weather or climate scientist is far more likely to be found debugging code written in Python, MATLAB, Interactive Data Language (IDL), NCAR Command Language (NCL) or R, than to be poring over satellite images or releasing radiosondes.

This computational revolution is not unique to the weather and climate sciences and has led to something of a reproducibility crisis in published research \citep[e.g.][]{Peng2011}. Most papers do not make the data and code underpinning key findings available, nor do they adequately specify the software packages and libraries used to execute that code. This means it is impossible to replicate and verify most of the computational results presented in journal articles today. By extension (and perhaps even more importantly), it is also impossible for readers to interrogate the data processing methodology. If a reader cannot find out which Python library was used in re-gridding a particular dataset, how can they build upon that re-gridding method and/or apply it in their own context?

A movement within the computational science community has arisen in response to this crisis, calling for existing communication standards to be adapted to include the data and code associated with published findings \citep[e.g.][]{Stodden2014}. The movement has also been active in producing best practice recommendations to guide scientists and stakeholders \citep[e.g.][]{Prlic2012,Stodden2012a,Sandve2013,Stodden2014}, and similar calls and guidelines have appeared in numerous editorials and commentaries in recent years \citep[e.g.][]{Barnes2010,Merali2010,Ince2012}. In response to this sustained campaign, there has been a modest but perceptible reaction from funding agencies and academic journals. Agencies like the National Science Foundation now require dataset disclosure and encourage software availability, however this is not consistently enforced and compliance is largely left to the authors themselves \citep{Stodden2013}. A recent review of journal policies found a trend toward data and code availability, but overall the vast majority of journals have no data or code policy \citep{Stodden2013}.

Similar to many other computational disciplines, in the weather and climate sciences progress on code availability is lagging behind data availability. The societies behind most of the major journals (American Meteorological Society, Royal Meteorological Society, American Geophysical Union and European Geophysical Union) all have official data policies \citep[e.g.][]{Mayernik2015}, however only two of the four indicate that code is included under their broad definition of data or metadata. Where code is included, statements regarding code availability consist of brief, vague suggestions that are not enforced by editors and reviewers. New journals such as Geoscientific Model Development have arisen for documenting work where code/software is the primary output (e.g. the development of a new climate model), but little progress has been made in documenting the computational aspects of research where code is ancillary to the main focus (i.e. where the code is not of sufficient consequence to require a standalone paper devoted to its description). Given that much of the research conducted by weather and climate scientists is based on previously documented datasets and/or models (e.g. a paper might analyze a reanalysis dataset or the output from running a well-known atmospheric model forced with anomalous sea surface temperatures), ancillary code availability (as opposed to data availability or primary code availability) is the component of the reproducibility crisis common to essentially all research today.

While it is tempting to simply decry the slow response of journals and funding agencies in the face of this crisis, the reality is that examples of reproducible weather and climate research upon which to base new communication standards have only just begun to emerge. For instance, the Max Planck Institute for Meteorology (MPI-M) recently enacted a policy \citep{Stevens2015a} that requires all primary data (including ancillary code) to be archived, and papers adhering to that policy are now starting to be published \citep[e.g.][]{Stevens2015}. There are also a limited number of examples from other research disciplines, where highly motivated computational scientists have taken a variety of different approaches to publishing reproducible results \citep[e.g.][]{Ketcheson2012,Crooks2014,Bremges2015,Schmitt2015}.

In order to stimulate further discussion and progress in this area, this essay describes a procedure for reporting computational results that was employed in a recent Journal of Climate paper \citep[][hereafter referred to as IS2015]{IrvingSimmonds2015}. Building on the aforementioned examples, the procedure was developed to be consistent with recommended computational best practices and seeks to minimize the time burden on authors. For the reasons articulated above and the fact that data availability has already been addressed in a recent BAMS essay \citep{Mayernik2015a}, the focus of the procedure is on ancillary code availability. It should provide a starting point for weather and climate scientists looking to publish reproducible research, and it is proposed that the procedure could be adopted as a minimum standard by relevant academic journals and institutions.