Leslie Hsu edited background.md  over 9 years ago

Commit id: 9d3353ef3de955370f7e402b33f85e3251ef3b07

deletions | additions      

       

A sample-based data system stores observations that are come from discrete samples, such as rocks, sediment, fluid, or other materials. Analytical measurements of the samples, descriptions of sampling location and techniques, analytical procedures of data collection, and pre-analysis sample preparation are stored in an integrated manner. Here, integration means alignment and standardization of vocabularies, sample names, and output. Multi-layered and interrelated pieces of information create additional challenges when compared to grid-based sensor data, (e.g. satellite, seismic, elevation) which may have better standardization and data formats. Sample-based databases often grow from single investigator interests and efforts, slowly gaining traction, data, and users, until they are morphed into an online, accessible system.  One of the first online sample based geochemical databases was [PetDB](http://www.earthchem.org/petdb), the Petrological Database, formerly the Petrological Database of the Ocean Floor. The database was built on a sample-based data model \citep{Lehnert_2000}, which served as a foundational structure for several disciplinary databases that developed in the following decade, including [SedDB](http://www.earthchem.org/seddb) (Lehnert et al., 2005), [GEOROC](http://georoc.mpch-mainz.gwdg.de/georoc/) (Sarbas and Nohl, 2009), NAVDAT (Walker et al., 2004), and VentDB \citep{34e0d125-4bec-4225-8afa-59a6c7565821}. \citep{Mottl2012}.  These databases combine data from numerous sources into a single relational synthesis database, allowing the rapid production of integrated datasets, and significantly reducing the time commitment that was previously necessary to manually compile the same data from the original sources. The state of the art of geochemical data publication was laid out a decade ago by \citet{Staudigel_2003} with the goal of initiating discussion of data formats and metadata in geochemistry at the “earliest stages of [geochemistry’s] exploitation of Information Technology”. \citet{Staudigel_2003} highlight complexities within the organizational structure relating to standardization, conventions, lack of tabular data, and incomplete metadata. These issues have not disappeared, but management and mitigation have significantly improved and evolved. In the last decade, improvements such as governmental data policy statements [e.g. U.S. Office of Management and Budget Memo Open Data Policy—Managing Information as an Asset (M-13-13) [http://www.whitehouse.gov/sites/default/files/omb/memoranda/2013/m-13-13.pdf], endorsement of best practices, and stricter rules regarding data reporting were implemented by editors, reviewers, professional societies, and funding agencies (e.g. CODATA Scientific Data Policy Statements). Editors from several peer-reviewed journals that publish manuscripts including geochemical data agreed on minimum standards for documentation about data quality, sample information, and the format and accessibility, which was published as the Editors Roundtable document “Requirements for the Publication of Geochemical Data” \citep{c863b8bd-5a6f-4010-8c42-70a9d18cc4c8}. The recommendations have been implemented by some journals, but strict enforcement is not yet common.