Benefits of Archiving Materials Science and Engineering Data

There is a growing realization within the global scientific community that the data generated in the course of research is an oft-overlooked asset with considerable residual value to other scientists and engineers, and that often a significant portion of the data is stored but not used. The following are several benefits of increasing access to materials science and engineering data in digital form:

Data Reuse

  • Scientific productivity and return on investment in research infrastructure

  • Secondary hypothesis testing

  • Reduction/elimination of paying for data generation multiple times

  • Comparisons with previous studies

  • Integration with previous and future work

  • Reproducing and checking analyses

  • Simplifying and enhancing subsequent systematic reviews and meta-analyses

  • Interdisciplinary research

  • Teaching

Incentives

  • Increasing academic credit (citations)

  • Access to one’s own data at a future date

  • Convenience and security of cloud storage

Other

  • Validated reference datasets for testing algorithms/computations

  • Meeting funding agency requirements to share data

  • Reducing the potential for duplication of effort

  • Reduction of error and fraud

The MRS-TMS “Big Data” survey asked participants to evaluate whether given attributes would act as impediments or motivators to sharing data, Figure \ref{fig:IMPEDE}. \cite{TMSMRS} The bottom of the graph shows the largest impediments, which are primarily driven by legal considerations. The top of the graph demonstrates the strongest positive motivators are the increased attention and credit a researcher may draw for one’s work.

The impact on research productivity owing to the provision of well-calibrated, well-documented archival data products is clearly demonstrated in the case of NASA’s Hubble Space Telescope. Initially archival data was not used very extensively; the data suffered from spherical aberration, of course, resulting in a factor of 10 decrease in sensitivity from expectations. But in the early 1990s there was also somewhat of a stigma attached to using archival data for research: this was somehow not as good or pure as collecting one’s own data at a telescope. But times have changed, and HST archival data is now used in more than half of all peer-reviewed publications, by astronomers not affiliated with the teams who proposed for the original observations (see Figure \ref{fig:HST}). There are a number of reasons for the big increase in archival data use. HST observing time is very difficult to get, with typically a seven-to-one oversubscription ratio in the proposal process. All HST data is routinely pipeline processed, yielding an archive of “science ready” data products. All HST data becomes public after a nominal twelve-month proprietary period. And HST data taken for one purpose can often be utilized for studies of a substantially different intent. While this high level of re-use may not be achieved for all research experiments, the HST example clearly shows that a substantially improvement in research productivity can be achieved, at a very modest incremental cost, when proper care is taken in designing the data management system.