Julien Emile-Geay edited Introduction.tex  over 9 years ago

Commit id: dd782ac62c582e3cc7e1a13c3723bdf336e5eaa9

deletions | additions      

       

\section{Introduction}  Science is entering a data-intensive era, where insight is increasingly gained by extracting information from large volumes of data \cite{Hey_2012}. This is particularly critical in paleoclimatology, as understanding past changes in climate system requires observations across large spatial and temporal scales. Paleoclimatic observations are typically limited to small geographic domains, so investigating large scales requires integrating many disparate studies and datasets. Observational work in paleoclimatology exemplifies the ``long-tail'' approach to data collection \citep{Heidorn_2009}: \cite{P_Bryan_Heidorn_2008}:  the majority of observations are gathered by independent scientists with no formal language for describing their data and meta-data to each other -- or to machines -- in a standardized fashion. This results in a ``Digital Tower of Babel'', making the curation, access, re-use and valorization of paleoclimate data far more difficult than it should be, hindering scientific progress. Recognizing the need for data sharing, paleoclimate investigators have made a major effort over the past decade to make their data available to the broader community, largely through online archiving systems like the \href{World Data Center for Paleoclimatology}{http://www.ncdc.noaa.gov/paleo/wdc-paleo.html} and \href{Pangaea}{http://www.pangaea.de/} . Nonetheless, the lack of consistent formatting and metadata standards has made the re-use of such data needlessly labor-intensive by preventing computers from participating in the task of making connections across datasets. As the number of records in these archives has grown, making connections manually has become more and more challenging, hampering integrative efforts at the very time they should be flourishing. A clear solution to this problem is to \textbf{establish data and metadata standards for paleoclimatology}. Standardization would pave the way for many radical improvements in paleoclimatology, as it has in any other field of science or industry. Firstly, it would permit crowd-source data curation, which would relieve a significant burden from data curators and bring more dark data to light. Secondly, it would enable universal, open-source software libraries to be built, ensuring that the whole community has access to sound, state-of-the-art tools to process, analyze, compare and model their data. Thirdly, it would allow semantic technologies to enter the realm of paleoclimatology, thus enabling the tremendous apparatus of the machine-learning and artificial intelligence communities to discover new patterns in the data. It would also uncover relationships with other Linked Open Data \citep{BHBL09}, both in and outside the geosciences.