Yolanda Gil edited Introduction.md  over 10 years ago

Commit id: 4eb4eb1febfb927dd471199e41f19618cbb6ff06

deletions | additions      

       

Sources of "data" and "analysis" that need care and feeding today extend far beyond classical experimental or observational studies like those of Galileo. Theoretical investigations can create large data sets through simulations (e.g. [The Millennium Simulation Project](http://www.mpa-garching.mpg.de/galform/virgo/millennium/)), and large scale data collection often takes place as a community-wide effort (e.g. [The Human Genome project](http://www.genome.gov/10001772)). Extra work, no doubt, is associated with nurturing your data, but care up-front will save time and increase insight later. For most researchers today, conducting research with sharing and reuse in mind is essential, especially in large collaborations, but it still requires a paradigm shift. Most people are more motivated by piling up publications and by getting to the next one as soon as possible.   Although scientists state many reasons that make sharing data difficult [Tenopir et al 2011], here there  are many good reasons to share data. Sharing data keeps us honest and improves peer review and mutual validation [Corbyn 2012; Krugman 2013; Baggerly and Coombes 2009]. In addition, scientific data has tremendous value, so much so that new policies declare data "an asset for progress" [Holdren 2013]. Keeping it to yourself is questionable practice when it is funded by taxpayers that much prefer to see those assets fuel innovation than rot (quite literally) in a lab. Another reason is the increased reputation and citation of research when datasets become available [Piwowar et al 2007]. Despite the recent trend to establish data sharing policies by publishers, funding agencies, and other institutions, the practice of data sharing is far from ideal [Savage and Vickers 2009]. As the frequency with which each of us wishes we had access to extant but now unfindable data increases, the more we realize that bad data management is bad for science. How can we improve?