Charles H. Ward edited To Archive or Not to Archive .tex  about 10 years ago

Commit id: d5df3a330a8b2c72a26a869f544d3b6ba5bbc8ac

deletions | additions      

       

The most critical question to be answered in setting policies for publications is “what data should be archived?” The answer is essential in providing clear expectations for authors, editors, and reviewers, as well as determining the size of the data repositories needed. Other disciplines have already embarked on this journey and have devised a variety of approaches that suit the data needs of their community for their stage of “digital maturity.”   Two ends of the spectrum in addressing this question are presented here. The first assumes all data supporting a publication are worthy of archiving. This criterion is found most often in peer reviewed journals that have narrow technical scope and generally deal with very limited data types. For example, journals in crystallography and fluid thermodynamics have very stringent data archiving policies that prescribe formats and specific repositories for the data submitted.\cite{actacryst,Koga_2013} Other journals that cover broader technical scope, and therefore deal with more heterogeneous data, have implemented more subjective criteria for data archiving and a distributed repository philosophy. Earth sciences and evolutionary biology have typically taken this approach. It is likely that the approach adopted by MSE publications may also span a similar spectrum, depending on the scope of the publication.    The MRS-TMS “Big Data” survey provided insight into the community’s perspective on the relative value of access to various types of materials data Figure  (\ref{fig:FIGURE_3}). It’s interesting to note that as the complexity of the data and metadata increase (generally) toward the right-hand side of the chart, the community’s perceived need to have access to this data decreases. This could be due to many factors including the difficulty in assuring the quality of such data as well as the lack of familiarity with tools to handle the data complexity. However, with complexity comes a richness of information that if properly tapped could be extraordinarily valuable. In astronomy, for example, the Sloan Digital Sky Survey created a very complex database of attributes of stars, galaxies, and quasars. The wealth of information and immense discovery potential led many in the research community to become expert users of SQL, and for the survey to yield nearly 6,000 peer-reviewed publications.\footnote{This is based on a query to the Astrophysics Data System, http://adsabs.harvard.edu/, for peer-reviewed papers mentioning either "SSDS" or "Sloan" in the title or abstract of the paper. A query executed on April 9, 2014 resulted in 5,825 papers.}   For those publications with wide technical scope, it will be difficult to provide a universal answer to “what data should be archived?” In these cases, the decision for what data to archive may best be left to the judgment of the authors, peer reviewers, and editors. A particularly useful metric might be the cost/effort to produce the data. For example, the “exquisite” experimental data associated with a high energy diffraction microscopy experiment provide very unique, expensive, and rich datasets with great potential use to other researchers. Clearly, based on these factors the dataset should be archived. On the other hand, the results from a model run on commercial software that takes five minutes of desktop computation time would likely not be worthy of archiving as long as the input data, boundary conditions, and software version were well defined in the manuscript. However, even the data from a simple tensile test may be worthy of archiving as publications do not typically provide the entire curve; while the paper may report only yield strength, another researcher may be interested in work hardening behavior. Having the complete dataset in hand allows another researcher to explore alternative facets of the material’s behavior. The basic elements of criteria for determining the data required for archiving could include:  \begin{itemize}