Robert Hanisch edited Data Repositories.tex  about 10 years ago

Commit id: 2258e1bb71b0c0152e247f70ce49a43ef8fae0b9

deletions | additions      

       

\subsection{Data Repositories}  Aside from crystallographic data repositories, there are at this time perhaps no dedicated materials data repositories that meet the required characteristics defined above. The materials science and engineering community does have numerous publically-accessible data repositories; however, the majority of these are associated with specific projects or research groups, and their persistence is therefore dependent on individual funding decisions. These repositories are primarily established to house and share the research data generated within a specific project or program. They generally don’t follow uniform standards for data and metadata, nor provision for data discoverability and citation. There are very few repositories established with the explicit objective of providing MSE with public repositories for accessible digital data. In short, publically accessible, built-for-purpose repositories and the associated infrastructure for access, safe storage and management still need to be developed—this developed and sustainably funded—this  is the largest impediment to implementing viable data archiving policies. (See, for example, ``Sustaining Domain Repositories for Digital Data: A White Paper''\footnote{http://datacommunity.icpsr.umich.edu/sites/default/files/WhitePaper_ICPSR_SDRDD_121113.pdf}.)  Evolutionary biology, for example, allows a mix of repositories that meet established criteria. Such criteria may be as simple as requiring data cited to be permanently archived in data repositories that meet the following conditions:  \begin{enumerate} 

\item Allow bi-directional linking between paper and dataset  \item Provide persistent digital identifier  \end{enumerate}  One tempting option might be to take advantage of the on-line storage capability several journals already offer for supplementary materials accompanying journal articles. However, as presently constructed these are not amenable to best practices for dataset storage as they generally are not independently discoverable,  searchable, separately citable, nor aggregated in one location. In fact, some publishers are reducing or eliminating supplementary file storage due to the haphazard structure and rules associated with their use. Further, new global government policies promoting open access to research works have the publishing industry in a state of flux with regard to their long-standing, subscription-based business model. Publishers have been extremely reticent in taking on a data archiving responsibility given the economic uncertainties in the publishing marketplace.\cite{discussion} Also, there is a risk that for-profit publishers might restrict access to digital data assets that are co-located with the journal.    As alluded to in the previous section, a fundamental consideration in repository design and/or selection is the level to which the repository will present structured versus unstructured data. Structured technical databases tend to be more useful to a technical community due their uniformity, as evidenced by their data reuse rate.\cite{acharya} A perfect construct would see the vast majority of materials data resident within structured repositories. A disciplined data structure provides enormous advantages to the researcher both in terms of data discoverability and confidence in its use. However, this structure must be enabled by the application of broader and deeper standards for data and metadata, standards that do not currently exist.