Making data a first-class citizen in research

The Authorea Team

Scientists generate data daily. While much of this consists of failed attempts at achieving any significance, this information is still highly valuable. So-called "negative data" can spur new ideas and research as well as save other scientists time and money--two dwindling resources in the lives of researchers. Despite the value in negative data, many scientists do not publish such findings (Fanelli 2010).
Publishing "negative" findings has less to do with hesitation over the data's importance and more so to do with traditional barriers put up by publishers, specifically cost, time, and editorial bias. Arguably, cost and time are necessary to maintain the quality of the scientific record, however preprints have been shown to be a viable method of communicating research in a cost-effective and rapid manner for decades in some fields.
It's safe to assume that a majority of the data generated by researchers is not published. It's virtually impossible to track how much data is generated, considering most of it doesn't get saved or archived. There is a growing trend among scientists to keep their data sets hidden, which many open access platforms aim to combat, including Authorea.  Thus, the problem may be even larger than we know.
Herein lies the problem with scientific publishing today. Encouraging scientists to archive their data is one step in making open access science a reality, but it requires the resources to do so. There are a few resources today: Places like Dryad, founded in 2010, or Dat, allow scientists to archive their data online. Publishing platforms like Authorea allow researchers to routinely update their findings on a safe and open repository with the publication.  We think putting the data with the article makes the most sense: Consolidating the research output in one place can help preventing data to be lost or become inaccessible, also minimizing the 'distance' between the data and the methods used to collect it. In the era of big data, having all datasets (including negative data) in one place seems to be the natural direction and a way to foster discovery and maximizing scientific output.
Links to external data sources break over time. For publications older than 8 years, more than half of external links to data repositories are broken. From Pepe 2014