Authorea

Alberto Pepe edited Rule 4. Publish workflow as context.md almost 11 years ago

Commit id: 820149dc70610371a5b7e83dcb33445059a1b5cd

deletions | additions

# Rule 4. Publish workflow as context. -- they give crucial context to interpret/reuse your data products, verify Traditionally, what computer and information scientists call "workflow" has been captured in what scientists call the "methods" and/or "analysis" section(s) of a scholarly article, where data that you publish Make your science sufficiently inspectable that others can judge collection, manipulation, and analysis processes are described. Today, nearly every study uses computer software to carry out the value bulk of your contributions. In some fields inspectable means making your work reproducible, its workflow, but rarely is the end-to-end process described in others it means replicable or verifiable, for example. Anticipate not only a paper captured in just one software package. Thus, while directly publishing code is critical (see Rule 6), publishing a description of your own reuse processing steps offers essential context for interpreting and re-using data. In the future, the most useful workflow documentation will be part of a provenance record that links together all the data and associated software but pieces that others may wish led to re-run your analysis, or just a result: the data citation (Rule 2), the pointer tounderstand what you did. Consider publishing the code source (Rule 6), the workflow (this Rule), and intermediate data a scholarly paper. Systems that someone else would need to see to understand what you did. [Mathematica notebook link example, same with python notebook, github etc.] The logical extension of conducting science with document workflow in a way that they can plug into provenance visions like this one are best, so keep an eye out for such systems in mind is your field. Web services that encapsulate workflow are a good way to share reduce the burden of software overhead and dependencies. In life sciences, systems like Taverna and Kepler are good examples (xxrefsxx). Other standardized workflow documentation systems are offered by ``notebooks" within some software packages, such as the Mathematica and iPython notebooks (xxrefsxx). Systems (see Rule 2) that workflow. offer hdl and doi identifiers for data can, and do, offer those identifiers for workflow files as well. At a minimum, a simple sketch ofthe dataflow acrossall the software, indicating how intermediate data and final results were generated by the software, are generated, andhas the parameter values used in the analysis. [Aneta says: organization of workflows, examples] This includes intermediate data, esp important analysis, should be offered. Keep in mind that even if the data used are not "new," in that they come from a well-documented archive, it is still important to documentresults from queries to 3rd party services/data sources. A more detailed and formal way to describe the workflow is to use the W3C PROV standard. [Aneta says: the standard may not be known to people. Do we need the standards here, or just comment and link to the examples?] Providing web services archive query that encapsulate produced the workflow is a good way to reduce data you used, along with all the operations you performed on theburden of software overhead and dependencies. Organize your data analysis work after they were retrieved. Just as in the way that can be shared with your collaborators. This will make a final sharing Rules 1 through 3, keeping better track of the workflow. Use tools (electronic notebooks, for example IPython Notebook) to document your analysis process, code workflow, as context, will likely benefit you and results. your collaborators enough to justify the loftier, more altruistic, goals espoused here.