Alberto Pepe added Rule 4. Publish workflow as context.md  almost 11 years ago

Commit id: d487cb99cb6c0c8d01b9bc4f3b97914cae3c789b

deletions | additions      

         

# Rule 4. Publish workflow as context.   -- they give crucial context to interpret/reuse your data products, verify the data that you publish   Make your science sufficiently inspectable that others can judge the value of your contributions. In some fields inspectable means making your work reproducible, in others it means replicable or verifiable, for example. Anticipate not only your own reuse of the data and associated software but that others may wish to re-run your analysis, or just to understand what you did. Consider publishing the code source and intermediate data that someone else would need to see to understand what you did.   [Mathematica notebook link example, same with python notebook, github etc.]   The logical extension of conducting science with provenance in mind is to share that workflow. At a minimum, a simple sketch of the dataflow across all the software, indicating how intermediate data and final results were generated by the software, and has the parameter values used in the analysis. This includes intermediate data, esp important it is to document results from queries to 3rd party services/data sources. A more detailed and formal way to describe the workflow is to use the W3C PROV standard. Providing web services that encapsulate the workflow is a good way to reduce the burden of software overhead and dependencies. Organize your data analysis work in the way that can be shared with your collaborators. This will make a final sharing of the workflow. Use tools (electronic notebooks, for example IPython Notebook) to document your analysis process, code and results.