Authorea

Christine L. Borgman edited Rule 3. Data reuse in mind.md almost 11 years ago

Commit id: a2557bdf4c28549cae7a05fbf79867192fd4bc25

deletions | additions

# Rule 3. Conduct science with data reuse in mind. Information "provenance" is the sum of all of the processes and people (or institutions or other agents) and documents (data included!) that were involved in generating or otherwise influencing or delivering a piece of information ([W3CProvenance Group]( http://www.w3.org/2005/Incubator/prov/wiki/What\_Is\_Provenance\#A\_Working\_Definition\_of\_Provenance)). Perfect documentation of provenance is rarely, if ever, attained in scientific work today. The higher the quality of provenance information, the higher the chance of enabling data re-use. In general data re-use reuse is most possible when: 1) data; 2) metadata (information describing the data); and 3) information about the process of generating those data, such as code, are all provided. In trying to follow the Rules listed in this article, you will do best if you plan in advance for ways to provide all three kinds of information. In carrying out your work, consider what level of re-use you realistically expect and plan accordingly. Do you want your work to be fully **reproducible**? If so, then full provenance information is a must (e.g., working pipeline analysis code, a machine to run it on, and verifiable versions of the data). Do you just want your work to be **inspectable**? If so, then intermediate data products and pseudo-code may be sufficient. Do you want your data to be **usable** in a wide range of applications? If so, consider adopting standard formats and metadata standards early on. At the very least, keeping careful track of versions of data and code, with associated dates, will be appreciated by those looking back from the future.