James A Warren edited To Archive or Not to Archive .tex  about 10 years ago

Commit id: c7748ed2f291515cdc983c9cf97e7d4d5e99312c

deletions | additions      

       

Data itself can come in a variety of ‘processed’ levels including ‘raw’, ‘cleaned’, and ‘analyzed’. Such characterizations are, of course, subjective. Nonetheless, given the diversity of materials data, care will need to be taken in determining the appropriate amount of processing performed on a dataset to be archived. It is probably much more important at this stage of our digital maturity that the metadata accompanying the dataset provide sufficient pedigree and provenance to make the data useful to others, by defining the post-test processing performed.    Another factor to consider in setting guidelines for which data need to be archived is the expected annual and continuing storage capacity required. A very informal survey of 15 peer-reviewed journal article authors in NIST and AFRL found that most articles in the survey had less than 2 GB of supporting data per paper. However, those papers reporting on emerging characterization techniques such as 3-D serial sectioning and high energy diffraction microscopy were dependent on considerably larger datasets, approximately 500 GB per paper. The time and resources required to upload (by authors) and download (by users) data files less than 2 GB are reasonable. Other disciplines have established data repositories to support their technical journals. Experience to date indicates that datasets of up to approximately 10 GB can be efficiently and cost effectively curated. curated.\cite{acharya}  Repositories such as www.datadryad.org, show that datasets of this magnitude can be indefinitely stored at a cost of \$80 or less. less.\cite{datadryad}  However, datasets approaching 500 GB will very likely require a different strategy for storage and access. Thus a data repository strategy needs to consider this bimodal distribution of datasets. An additional factor when considering storage requirements is the high global rate of growth in materials science and engineering publications. Figure 3 shows the dramatic growth in the number of MSE journal articles published over the past two decades.