Alberto Pepe added Intro.tex  almost 11 years ago

Commit id: 37287393e11785f6b033f294d70f68114cd635eb

deletions | additions      

         

\section{Introduction}     In the early 1600s, Galileo Galilei turned a telescope toward Jupiter. In his log book each night, he drew to-scale schematic diagrams of Jupiter and some oddly-moving points of light near it, and he labeled each drawing with the date. Eventually he used his observations to conclude that the Earth orbits the Sun, just as Jupiter's moons orbit it. History shows Galileo to be much more than an astronomical hero, though. His clear and careful record-keeping and publication style not only let Galileo understand the Solar System: it has let others understand {\it how} Galileo did it, and his work is credited as the start of the Renaissance's ``Scientific Revolution." Galileo's notes, and the key publication based on them (xxref Siderius Nuncius, show page as figurexx), directly integrated his {\bf data} (drawings of Jupiter and its moons), key {\bf metadata} (timing of each observation, weather, telescope properties), and {\bf text} (descriptions of methods, analysis, and conclusions).     Today, many research projects are considered complete when the analysis is complete, and a short journal article has been written and published. But modern scientific publications almost never offer enough access to data and metadata to be used as the sole guide to repeating, or statistically verifying, scientific studies. Worse, researchers seeking to extend work based on others' (or even their own!) data, frequently have trouble finding those data.     {\bf This is a short guide to the steps scientists can take to ensure that the data and their analysis of it continue to be of value and to be recognized.}     Sources of ``data'' and ``analysis'' that need care and feeding today extend far beyond classical experimental/observational studies like Galileo's. Theoretical investigations can create large data sets through simulations (e.g. xxthe Millennium Simulation, ref.xx), and large scale data collection often takes place as a community-wide effort (e.g. the Human Genome project, xxrefxx).     There is no denying that there is extra work associated with nurturing your data, even if the expectation is that care up-front will save time and increase insight later. For most researchers today, conducting research with sharing and reuse in mind is essential, especially in large collaborations, but it still requires a paradigm shift. Most people are still motivated by piling up publications, and getting to the next one as soon as possible, but as the frequency with which each of us wishes we had access to already-aquired but now unfindable data increases, the more we realize that bad data management is bad for science. How can we improve?