Rule 3. Conduct science with a particular level of reuse in mind.

Data from others are hard to use without context describing what the data are and how they were obtained. The W3C Provenance Group defines information provenance as the sum of all of the processes, people (institutions or agents), and documents (data included!) that were involved in generating or otherwise influencing or delivering a piece of information. Perfect documentation of provenance is rarely, if ever, attained in scientific work today. The higher the quality of provenance information, the higher the chance of enabling data reuse. In general, data reuse is most possible when: 1) data; 2) metadata (information describing the data); and 3) information about the process of generating those data, such as code, are all provided. In trying to follow the Rules listed in this article, you will do best if you plan in advance for ways to provide all three kinds of information. In carrying out your work, consider what level of reuse you realistically expect and plan accordingly. Do you want your work to be fully reproducible? If so, then provenance information is a must (e.g., working pipeline analysis code, a platform to run it on, and verifiable versions of the data). Or do you just want your work to be inspectable? If so, then intermediate data products and pseudo-code may be sufficient. Or maybe your goal is that your data is usable in a wide range of applications? If so, consider adopting standard formats and metadata standards early on. At the very least, keep careful track of versions of data and code, with associated dates. Taking these steps as you plan and carry out projects will earn you the thanks of researchers, including you, looking back from the future. (Consult Appendix E for a list of tools to package all your research materials with reuse in mind)