Charles H. Ward edited Data Repositories.tex  about 10 years ago

Commit id: 88c58b7b23420cfb6d679bc23d1bd163335126aa

deletions | additions      

       

  As alluded to in the previous section, a fundamental consideration in repository design and/or selection is the level to which the repository will present structured versus unstructured data. Structured technical databases tend to be more useful to a technical community due their uniformity, as evidenced by their data reuse rate.\cite{acharya} A perfect construct would see the vast majority of materials data resident within structured repositories. A disciplined data structure provides enormous advantages to the researcher both in terms of data discoverability and confidence in its use. However, this structure must be enabled by the application of broader and deeper standards for data and metadata, standards that do not currently exist.    In all likelihood, like biology, MSE publications will be dependent on a collection of repositories that are tailored to specific materials data. For example, NIST is building and demonstrating a data file repository for CALPHAD and interatomic potentials.\cite{NISTMDR} These may be expandable and largely sufficient for thematic journals publications  such as those devoted to thermodynamics and diffusion. However, repositories such as this will only fill a relatively small niche need in MSE.   Finally, a business model for sustainably archiving materials data is required. Other technical fields, such as earth sciences, can at least partially rely on government-provided repositories for large and complex datasets. Without these types of repositories to build on, MSE will need to establish viable repository solutions. In response to funding agency requirements for data management plans some universities, Johns Hopkins for example, are beginning to provide centrally-hosted data repositories, but these are not yet common.\cite{jhudata} Private fee-for-service repository services, such as labarchives \textit{labarchives}  and figshare, \textit{figshare},  are also evolving to meet growing demand for accessible data storage.\cite{labarchives,figshare} Additionally, ASM International is working to create a prototype materials data repository through its close association with Granta Design. Termed the Computational Materials Data Network (CMDN), this is an interesting option as the data repository will provide a structured database specifically for materials data, but the business model for CMDN has not yet been solidified.\cite{cmdn} A key open question remains how funding agencies will respond to the OSTP open research policy memo, and how they will fund activities making data open to the public.