Authorea

Christine L. Borgman grammatical corrections. removed "raw" data and mentioned versions instead. almost 11 years ago

Commit id: 7f4152443aae5ca449f9058ef4b08a7a785a0901

deletions | additions

# Rule 3. Conduct science with data reuse in mind. Information "provenance" is the sum of all of the processes and people (or institutions or other agents) and documents (data included!) that were involved in generating or otherwise influencing or delivering a piece of information ([W3CProvenance Group]( http://www.w3.org/2005/Incubator/prov/wiki/What\_Is\_Provenance\#A\_Working\_Definition\_of\_Provenance)). Perfect documentation of provenance is rarely, if ever, attained in scientific work today. The higher the quality of provenance information, the higher the chance of enabling data re-use. In general data re-use is most possible when: 1) data; 2) metadata (information describing the data); and 3) information about the process of generating those data, such as code, are all provided. In trying to follow the Rules listed in this article, you will do best if you plan in advance for ways to provide all three kinds of information. But, in In carrying out your work,you can consider what level of re-use you realistically expect, expect and plan accordingly. Do you want your work to be fully **reproducible**? If so, then full provenance information is a must (e.g. (e.g., working pipeline analysis code, a machine to run it on, and raw verifiable versions of the data). Do you just want your work to be **inspectable**? If so, then intermediate data products and pseudo-code may be sufficient. Do you want your data to be **usable** in a wide range of applications? If so, consider adopting standard formats and metadata standards early on. At the very least, keeping careful track of versions of data and code, with associated dates, will be appreciated by those looking back from the future.