Authorea

Alberto Pepe edited Introduction.md over 10 years ago

Commit id: af42c4bcda985db2c71c672c84ea7ac21cc39adf

deletions | additions

**This article is a short guide to the steps scientists can take to ensure that their data and associated analyses continue to be of value and to be recognized.** Sources of "data" and "analysis" that need care and feeding today extend far beyond classical experimental or observational studies like those of Galileo. Theoretical investigations can create large data sets through simulations (e.g. [The Millennium Simulation Project](http://www.mpa-garching.mpg.de/galform/virgo/millennium/)), and large scale data collection often takes place as a community-wide effort (e.g. [The Human Genome project](http://www.genome.gov/10001772)). Despite their size and heterogeneity, these scientific data have tremendous value, so much so that new government policies declare data "an asset for progress" \cite{holdren}. So how do we go about caring for and feeding these data? Extra work, no doubt, is associated with nurturing your data, but care up front will save time and increase insight later. For most researchers today, conducting research with sharing and reuse in mind is essential, especially in large collaborations, but it still requires a paradigm shift. Most people are more motivated by piling up publications and by getting to the next one as soon as possible. As the frequency with which each of us wishes we had access to extant but now unfindable data increases, the more we realize that bad data management is bad for science. How can we improve?