Alberto Pepe edited Rule 3. Data reuse in mind.md  over 10 years ago

Commit id: c5f0c19b8d7e9fe01b2ca29a21d5e80ed5840adb

deletions | additions      

       

# Rule 3. Conduct science with data reuse in mind.   Data from others is hard to reuse without context describing what the data is and how it was obtained. If you wait Information **provenance** refers  todocument this context until  the time sum  of data publication you may have forgotten some of it or just find it to take too much effort. Crucial to this context is the **provenance** all  ofyour data. "Provenance" refers to a record that describes  the processes, people (institutions or agents), and documents (data included!) that were involved in generating or otherwise influencing or delivering a piece of information ([W3C Provenance Group]( http://www.w3.org/TR/2013/REC-prov-dm-20130430/#dfn-provenance)). Perfect documentation of provenance is rarely, if ever, attained in scientific work today. The higher the quality of provenance information, the higher the chance of enabling data reuse. In general, data reuse is most possible when: 1) data; 2) metadata (information describing the data); and 3) information about the process of generating those data, such as code, are all provided. In trying to follow the Rules listed in this article, you will do best if you plan in advance for ways to provide all three kinds of information. In carrying out your work, consider what level of reuse you realistically expect and plan accordingly. Do you want your work to be fully **reproducible**? If so, then provenance information is a must (e.g., working pipeline analysis code, a platform to run it on, and verifiable versions of the data). Or do you just want your work to be **inspectable**? If so, then intermediate data products and pseudo-code may be sufficient. Or maybe your goal is that your data is **usable** in a wide range of applications? If so, consider adopting standard formats and metadata standards early on (e.g., [Moreau and Missier 2013]. on.  At the very least, keeping careful track of versions of data and code, with associated dates, will be appreciated by those looking back from the future.[Moreau and Missier 2013] "PROV-DM: The PROV Data Model." Luc Moreau and Paolo Missier (Eds). World Wide Web Consortium (W3C) Recommendation 30 April 2013. Available from http://www.w3.org/TR/prov-dm/.