Leslie Hsu edited benefits.md  over 9 years ago

Commit id: f77f3ad242493324d661af0ea5593dced453529c

deletions | additions      

       

The EarthChem Library, the topical synthesis databases, and the EarthChem Portal together form a suite of tools to support geochemists throughout their research workflow – from data discovery to analysis to publication. To be successful, data systems must preserve useful metadata, respond to community driven development of features, and have a user-friendly interface for accessing the data. An essential part of the success of IEDA EarthChem’s data systems is the response to feedback and guidance provided by the user community \citep{Lehnert2013}. The standards for data reporting were developed and refined by targeted user groups to ensure that the output result is useful for users. Frequent contact with the user community through workshops, meeting booths, informational emails, social media, webinars, and a User Committee composed of faculty and specialist users ensure that feedback is continually gathered. Particularly in the long-tail field of sample-based analytical geochemistry, review by and iterative discussion with a domain scientist is important for safeguarding standards for archiving the data. Users report inaccuracies in the database, which are corrected and noted for others. Without community interaction, the EarthChem data systems would be susceptible to errors in both content and development of functionalities.  The EarthChem systems unquestionably provide increased access to scientific data and tools for the geochemistry community and to the broader global community of scientists, students, educators, and industry users \citep[][e.g.]{Lehnert2007}. \citep[e.g.]{Lehnert2007}.  Anyone with Internet access can download the data or use the tools. Over the past year, feedback from the PetDB system indicates that users who download datasets are ~85% for research, ~13% for education, and ~2% for other reasons. In addition to bringing some data out behind publication pay-walls, EarthChem provides investigator support for publishing data that may not have otherwise been made publically available. This type of support is essential for scientists working in the long-tail, for whom there is little guidance. Data included in EarthChem has an added value to the original published datasets in a number of ways, including additional sample metadata, sample name alignment, and sample classification by chemical composition. When chemical values that are relevant to a specific topical synthesis database are found in the literature without adequate documentation for inclusion in the integrated dataset, EarthChem data managers will obtain missing metadata from the author. Required information includes georeferenced sample locations, sampling technique, and laboratory preparation and equipment. Sample names, notoriously ambiguous for samples collected during research cruises (e.g., D1-01 for Dredge 1, Sample 1), are carefully aligned across all publications citing the same sample, and the topical synthesis databases provide a list of the sample aliases. This is particularly useful for cruise samples dredge and core samples that are sampled and subsampled several times, and are often given different names in publications by different authors (e.g., D1-01 and D1-1). 

Proper attribution for scientific results is a common concern of investigators whose data are integrated into larger databases. EarthChem promotes proper attribution to investigators for their scientific data in two ways. First, the ECL provides persistent, citable DOIs for each submitted dataset. This ensures that the individual author can be credited for their contributions to the larger database. A second way is the Portal tool that allows investigators to track the number of times data from a particular author has been downloaded into an integrated dataset. The benefit of this is improved citation and reporting function for funding agencies, promotion cases, or other situations that require quantitative measure of scientific output. EarthChem also encourages the citation of the original sources of data in addition to PetDB itself.  The challenges for electronic data publication in geochemistry outlined by Staudigel et al. (2003) \citet{Staudigel_2003}  have not been completely eliminated, but new users and uses of electronic geochemical data have grown tremendously in the past decade, with new databases, web applications, and scientific results. Data systems that were created in the past must incorporate new technologies and standards to remain relevant. Examples include unique identifiers that have been developed for publications (DOI), samples (IGSN), and people (ResearcherID, ORCiD), ([ResearcherID](http://www.researcherid.com/), [ORCiD](http://orcid.org/)),  which are being implemented into EarthChem systems. We have shown how EarthChem has addressed the challenges related to long-tail scientific data management and has contributed to the scientific output both in its home geochemistry field but also other seemingly unrelated related disciplines. Disciplinary sciences are interlinked when solving grand challenges in science, especially Earth Science, and thus so are the challenges for efficiently producing and reusing scientific data. IEDA EarthChem and other disciplinary data systems will continue to grow and integrate to provide their user community with more powerful tools for scientific analysis and discovery.