Authorea

Charles H. Ward edited To Archive or Not to Archive .tex about 10 years ago

Commit id: 2cdc417775e4d3ed9075d10a0e664a7b27c42b29

deletions | additions

The MRS-TMS “Big Data” survey provided insight into the community’s perspective on the relative value of access to various types of materials data, shown in Figure \ref{fig:COMPLEX}. It’s interesting to note that as the complexity of the data and metadata increase (generally) toward the right-hand side of the chart, the community’s perceived need to have access to this data decreases. This could be due to many factors including the difficulty in assuring the quality of such data as well as the lack of familiarity with tools to handle the data complexity. However, with complexity comes a richness of information that if properly tapped could be extraordinarily valuable. In astronomy, for example, the Sloan Digital Sky Survey created a very complex database of attributes of stars, galaxies, and quasars. The wealth of information and immense discovery potential led many in the research community to become expert users of SQL, and for the survey to yield nearly 6,000 peer-reviewed publications.\footnote{This is based on a query to the Astrophysics Data System, http://adsabs.harvard.edu/, for peer-reviewed papers mentioning either "SSDS" or "Sloan" in the title or abstract of the paper. A query executed on April 9, 2014 resulted in 5,825 papers.} For those publications with wide technical scope, it will be difficult to provide a universal answer to “what data should be archived?” In these cases, the decision for what data to archive may best be left to the judgment of the authors, peer reviewers, and editors. A particularly useful metric might be the cost/effort to produce the data. For example, the “exquisite” experimental data associated with a high energy diffraction microscopy experiment provide very unique, expensive, and rich datasets with great potential use to other researchers. Clearly, based on these factors the dataset should be archived. On the other hand, the results from a model run on commercial software that takes five minutes of desktop computation time would likely may not be worthy of archiving as long as the input data, boundary conditions, and software version were well defined in the manuscript. Of course, one must account for the perishable nature of code, particularly old versions of commercial code. However, even the data from a simple tensile test may be worthy of archiving as publications do not typically provide the entire curve; while the paper may report only yield strength, another researcher may be interested in work hardening behavior. Having the complete dataset in hand allows another researcher to explore alternative facets of the material’s behavior. The basic elements of criteria for determining the data required for archiving could include: \begin{itemize} \item Are the data central to the main scientific conclusions of the paper? \item Are the data likely to be usable by other scientists working in the field?