Alberto Pepe edited Rule 4. Publish workflow as context.md  almost 11 years ago

Commit id: b18453e163aa9bcb5ea48de25548f88dc403911e

deletions | additions      

       

-- they give crucial context to interpret/reuse your data products, verify the data that you publish  Make your science sufficiently inspectable that others can judge the value of your contributions. In some fields inspectable means making your work reproducible, in others it means replicable or verifiable, for example. Anticipate not only your own reuse of the data and associated software but that others may wish to re-run your analysis, or just to understand what you did. Consider publishing the code source and intermediate data that someone else would need to see to understand what you did.  [Mathematica notebook link example, same with python notebook, github etc.]  The logical extension of conducting science with provenance in mind is to share that workflow. At a minimum, a simple sketch of the dataflow across all the software, indicating how intermediate data and final results were generated by the software, and has the parameter values used in the analysis. [Aneta says: organization of workflows, examples]  This includes intermediate data, esp important it is to document results from queries to 3rd party services/data sources. A more detailed and formal way to describe the workflow is to use the W3C PROV standard. Providing web services that encapsulate the workflow is a good way to reduce the burden of software overhead and dependencies. Organize your data analysis work in the way that can be shared with your collaborators. This will make a final sharing of the workflow. Use tools (electronic notebooks, for example IPython Notebook) to document your analysis process, code and results.