Network models to evaluate reproducibility in biomedical research
_or, The Future of Science

The traditional way to publish scientific work is to write a narrative describing the performed experiments and related conclusions. Nowadays, the pressures for funding and journal impact factor generate a vicious circle promoting, at the very least, an increase in the minimum number of relevant findings required for publication and the over-stretching of claims. As a consequence, the problem of reproducibility in science has surged to the attention of the media, including The New York Times and The Wall Street Journal.

This situation has already generated several disadvantages for authors, funding bodies and research institutes:

  1. Only a small fraction of the experimental workflow is actually published, creating data loss and underestimation of the labor done by research fellows;

  2. Pressure to publish novel data decreases reproducibility and increases the amount of unconfirmed work;

  3. New technologies, usually developed at the start of a research project, are only published years later, slowing down scientific progress;

  4. Disconnect between funding and publication: research is often funded when most of the work is already done by using grant money from past proposals;

  5. Research papers become increasingly complex and specialized, forcing the peer review process to a black and white decision;

  6. Lack of metrics assessing the quality and reproducibility of research.

Hereby, we envision a solution to break the loop. Consider a platform for storing, peer reviewing and sharing all the scientific work produced within a research institute. We are thinking of a network of science where nodes are single experiments, not entire papers. Each experiment should be meaningful, i.e. must provide a conclusion statement and a short description of the materials and methods (much like it is presented during lab meetings). The envisioned platform provides a key feature: the directional linking of published experiments. This forking feature is critical because it builds the network semantics by establishing where the rationale of an experiment comes from (and where it eventually leads). Network semantics are important because they generate a better proxy index for reproducibility, and thus quality, than the current impact factor. Such a reproducibility index can be calculated from node properties according to network theories or ad hoc metrics like the Fork Factor. Lets call the envisioned platform a Semantic Experiment Network (SEN).

As a complement to the current publishing model, we propose that research centers adopt a local SEN (L-SEN), such that data will only be accessible within each institute. Although pre-print policies of most academic journals allow posting on pre-print servers prior to publication, the intra-institutional boundary may reassure the most skeptical laboratory heads in the transition. A laboratory using L-SEN to store and share data and code can naturally decide to assemble their data in a traditional paper. As such, the envisioned transition from traditional to digital publishing of scientific work will be controlled and gradual. Recent digital publishing projects like Authorea, FigShare, DataVerse and Synapse can provide an excellent starting point, already implementing most of the needed features.

As recently shown, funding is the main driving force for a change in the scientific discovery workflow. Laboratories in research institutes that adopt L-SEN can provide funding agencies with new tools to assess scientific performance and can thus attract more resources. It will not take long for funding agencies to realize the advantages of metrics based on quality of science (see below). At that point, publishing in SEN will be sustainable for authors, and it can go global (G-SEN). This will break the original vicious circle by explicitly adding reproducibility to the equation.