Alberto Pepe edited Introduction.md  over 10 years ago

Commit id: 2a03899e17d919c2654792f2e745ef53e51cee53

deletions | additions      

       

of the Renaissance "Scientific Revolution." Galileo’s notes, and the  key publication based on them \cite{galilei}, directly integrated his **data** (drawings of Jupiter and its  moons), key **metadata** (timing of each observation, weather, telescope  properties), and **text** (descriptions of methods, analysis, and conclusions), as shown in \label{fig:1}. Today, many research projects are considered complete when a journal article based on the analysis has been written and published. However, very few modern scientific publications offer sufficient access to data, and metadata is given as the sole guide to repeating or to statistically verifying scientific studies. Worse, researchers wishing to extend work based on others' (or even their own!) data, frequently have trouble finding those data. **This article is a short guide to the steps scientists can take to ensure that their data and associated analyses continue to be of value and to be recognized.** Sources of "data" and "analysis" that need care and feeding today extend far beyond classical experimental or observational studies like those of Galileo. Theoretical investigations can create large data sets through simulations (e.g. [The Millennium Simulation Project](http://www.mpa-garching.mpg.de/galform/virgo/millennium/)), and large scale data collection often takes place as a community-wide effort (e.g. [The Human Genome project](http://www.genome.gov/10001772)). Extra work, no doubt, is associated with nurturing your data, but care up-front will save time and increase insight later. For most researchers today, conducting research with sharing and reuse in mind is essential, especially in large collaborations, but it still requires a paradigm shift. Most people are more motivated by piling up publications and by getting to the next one as soon as possible. As the frequency with which each of us wishes we had access to extant but now unfindable data increases, the more we realize that bad data management is bad for science. How can we improve?