IEDA EarthChem: Supporting the sample-based geochemistry community with data resources to accelerate scientific discovery
Kerstin A. Lehnert(1), Leslie Hsu(*1), Tiffany A. Rivera(2), J. Douglas Walker(3)
* Corresponding author: firstname.lastname@example.org (1) Lamont-Doherty Earth Observatory, Columbia University, Palisades, NY 10964, USA (2) Westminster College, Salt Lake City, UT 84105 (3) University of Kansas, Lawrence, KS 66045, USA
Three main points
Integrated sample-based geochemical measurements enable new scientific discoveries in the Earth sciences. However, integration of geochemical data is difficult because of the variety of sample types and measured properties, idiosyncratic analytical procedures, and the time commitment required for adequate documentation. To support geochemists in integrating and reusing geochemical data, EarthChem, part of IEDA (Integrated Earth Data Applications), develops and maintains a suite of data systems to serve the scientific community. The EarthChem Library focuses on dataset publication, accessibility, and linking with other sources. Topical synthesis databases (e.g., PetDB, SedDB, Geochron) integrate data from several sources and preserve metadata associated with analyzed samples. The EarthChem Portal optimizes data discovery and provides analysis tools. Contributing authors obtain citable DOI identifiers, usage reports of their data, and increased discoverability. The community benefits from open access to data leading to accelerated scientific discoveries. Growing citations of EarthChem systems demonstrate its success.
Geochemical compilations of enormous numbers of dense, statistically significant measurements have driven large, global-scale scientific discoveries. Examples include studies on diversity in MORB composition (e.g. Gale et al., 2013), global distributions of elements in the Earth’s derma layers (Rauch, 2011) and global patterns of intraplate volcanism (Conrad et al., 2011). New analytical methodologies allow for increasing rates of data collection that should translate to more ground-breaking scientific discoveries. With this anticipated increase, it is not feasible for single scientists to compile “all available global data” from the existing literature. This inability highlights the need for data systems to provide support for data discovery, access, and analysis to investigators, who are otherwise left with a disorganized heap of un-usable data.
The IEDA (Integrated Earth Data Applications) EarthChem data facility (http://www.earthchem.org) develops and operates digital data collections focused on the geochemistry of rocks and sediments from a wide range of global geographic settings. EarthChem citations show that its use is extending far beyond its rock and sediment geochemistry origins (http://www.earthchem.org/citations). For example, EarthChem has been cited in diverse scientific studies such as prediction of natural base-flow stream water chemistry (Olson et al., 2012), a prototype of a web-based relational database for archaeological ceramics (Hein et al., 2011), and strontium and oxygen isotope fingerprinting of green coffee beans and its potential to proof authenticity of coffee (Rodrigues et al., 2010).
The citations, both within geochemistry and petrology or extending to new innovative uses, demonstrate the utility of the databases to the scientific community. However, the utility comes only after much work to address the challenges and complexities of data and information standardization, lack of investigator contributions due to lack of time or willingness, time needed for organizing data extracted from the literature, and the development and maintenance of systems that are useful to and used by the community. Geochemistry is an example of a discipline in the "long tail" of data (Heidorn, 2008), where individual investigators and labs hold troves of data collected with one-of-a-kind newly developed techniques. This type of data has its own unique issues in data system development. Disciplinary expertise is extremely helpful for proper documentation of data and associated metadata for reuse. A recurring theme is how to balance quality control with the amount of documentation provided, while giving proper credit to the investigators who originally obtained the data.
In this contribution, we describe the origin and current capabilities of IEDA EarthChem resources for sample-based geochemical data, list the benefits of those resources for scientists, and highlight some of the derived scientific results. We describe the options available to investigators for submitting their data to the system and opportunities for scientific attribution. We show how EarthChem has addressed the challenges related to long-tail scientific data management and contributed to scientific output.
A sample-based data system stores observations that come from discrete samples, such as rocks, sediment, fluid, or other materials. Analytical measurements of the samples, descriptions of sampling location and techniques, analytical procedures of data collection, and pre-analysis sample preparation are stored in an integrated manner. Here, integration means alignment and standardization of vocabularies, sample names, and output. Multi-layered and interrelated pieces of information create additional challenges when compared to grid-based