Yolanda Gil edited Rule 3. Data reuse in mind.md  over 10 years ago

Commit id: bfa4a9297b26b537adc5b733437297f3ae9ec722

deletions | additions      

       

# Rule 3. Conduct science with data reuse in mind.   Data from others is hard to reuse without context describing what the data is and how it was obtained. If you wait to document this context at the time of data publication you may have forgotten some of it or just find it to be too much effort. Crucial to this context is the **provenance** of your data. Information "provenance" is a record that describes all of the processes, people (institutions or agents) and documents (data included!) that were involved in generating or otherwise influencing or delivering a piece of information ([W3C Provenance Group]( http://www.w3.org/TR/2013/REC-prov-dm-20130430/#dfn-provenance)). Perfect documentation of provenance is rarely, if ever, attained in scientific work today. The higher the quality of provenance information, the higher the chance of enabling data reuse. In general, data reuse is most possible when: 1) data; 2) metadata (information describing the data); and 3) information about the process of generating those data, such as code, are all provided. In trying to follow the Rules listed in this article, you will do best if you plan in advance for ways to provide all three kinds of information. In carrying out your work, consider what level of re-use reuse  you realistically expect and plan accordingly. Do you want your work to be fully **reproducible**? If so, then provenance information is a must (e.g., working pipeline analysis code, a machine to run it on, and verifiable versions of the data). Do you just want your work to be **inspectable**? If so, then intermediate data products and pseudo-code may be sufficient. Do you want your data to be **usable** in a wide range of applications? If so, consider adopting standard formats and metadata standards early on (e.g., [Moreau and Missier 2013]. At the very least, keeping careful track of versions of data and code, with associated dates, will be appreciated by those looking back from the future. [Moreau and Missier 2013] "PROV-DM: The PROV Data Model." Luc Moreau and Paolo Missier (Eds). World Wide Web Consortium (W3C) Recommendation 30 April 2013. Available from http://www.w3.org/TR/prov-dm/.