There is a growing realization within the global scientific community that the data generated in the course of research is an oft-overlooked asset with considerable residual value to other scientists and engineers, and that often a significant portion of the data is stored but not used. The following are several benefits of increasing access to materials science and engineering data in digital form:
Scientific productivity and return on investment in research infrastructure
Secondary hypothesis testing
Reduction/elimination of paying for data generation multiple times
Comparisons with previous studies
Integration with previous and future work
Reproducing and checking analyses
Simplifying and enhancing subsequent systematic reviews and meta-analyses
Interdisciplinary research
Teaching
Increasing academic credit (citations)
Access to one’s own data at a future date
Convenience and security of cloud storage
Validated reference datasets for testing algorithms/computations
Meeting funding agency requirements to share data
Reducing the potential for duplication of effort
Reduction of error and fraud
The MRS-TMS “Big Data” survey asked participants to evaluate whether given attributes would act as impediments or motivators to sharing data, Figure \ref{fig:IMPEDE}. \cite{TMSMRS} The bottom of the graph shows the largest impediments, which are primarily driven by legal considerations. The top of the graph demonstrates the strongest positive motivators are the increased attention and credit a researcher may draw for one’s work.
The impact on research productivity owing to the provision of well-calibrated, well-documented archival data products is clearly demonstrated in the case of NASA’s Hubble Space Telescope. Initially archival data was not used very extensively; the data suffered from spherical aberration, of course, resulting in a factor of 10 decrease in sensitivity from expectations. But in the early 1990s there was also somewhat of a stigma attached to using archival data for research: this was somehow not as good or pure as collecting one’s own data at a telescope. But times have changed, and HST archival data is now used in more than half of all peer-reviewed publications, by astronomers not affiliated with the teams who proposed for the original observations (see Figure \ref{fig:HST}). There are a number of reasons for the big increase in archival data use. HST observing time is very difficult to get, with typically a seven-to-one oversubscription ratio in the proposal process. All HST data is routinely pipeline processed, yielding an archive of “science ready” data products. All HST data becomes public after a nominal twelve-month proprietary period. And HST data taken for one purpose can often be utilized for studies of a substantially different intent. While this high level of re-use may not be achieved for all research experiments, the HST example clearly shows that a substantially improvement in research productivity can be achieved, at a very modest incremental cost, when proper care is taken in designing the data management system.