this is for holding javascript data
Alberto Pepe edited Rule 4. Publish workflow as context.md
almost 11 years ago
Commit id: 820149dc70610371a5b7e83dcb33445059a1b5cd
deletions | additions
diff --git a/Rule 4. Publish workflow as context.md b/Rule 4. Publish workflow as context.md
index 7716938..91e2ef2 100644
--- a/Rule 4. Publish workflow as context.md
+++ b/Rule 4. Publish workflow as context.md
...
# Rule 4. Publish workflow as context.
-- they give crucial context to interpret/reuse your data products, verify Traditionally, what computer and information scientists call "workflow" has been captured in what scientists call the
"methods" and/or "analysis" section(s) of a scholarly article, where data
that you publish
Make your science sufficiently inspectable that others can judge collection, manipulation, and analysis processes are described. Today, nearly every study uses computer software to carry out the
value bulk of
your contributions. In some fields inspectable means making your work reproducible, its workflow, but rarely is the end-to-end process described in
others it means replicable or verifiable, for example. Anticipate not only a paper captured in just one software package. Thus, while directly publishing code is critical (see Rule 6), publishing a description of your
own reuse processing steps offers essential context for interpreting and re-using data.
In the future, the most useful workflow documentation will be part of
a provenance record that links together all the
data and associated software but pieces that
others may wish led to
re-run your analysis, or just a result: the data citation (Rule 2), the pointer to
understand what you did. Consider publishing the code
source (Rule 6), the workflow (this Rule), and
intermediate data a scholarly paper. Systems that
someone else would need to see to understand what you did.
[Mathematica notebook link example, same with python notebook, github etc.]
The logical extension of conducting science with document workflow in a way that they can plug into provenance
visions like this one are best, so keep an eye out for such systems in
mind is your field. Web services that encapsulate workflow are a good way to
share reduce the burden of software
overhead and dependencies. In life sciences, systems like Taverna and Kepler are good examples (xxrefsxx). Other standardized workflow documentation systems are offered by ``notebooks" within some software packages, such as the Mathematica and iPython notebooks (xxrefsxx). Systems (see Rule 2) that
workflow. offer hdl and doi identifiers for data can, and do, offer those identifiers for workflow files as well.
At a minimum, a simple sketch of
the dataflow across
all the software, indicating how intermediate data and final results
were generated by the software, are generated, and
has the parameter values used in the
analysis. [Aneta says: organization of workflows, examples] This includes intermediate data, esp important analysis, should be offered. Keep in mind that even if the data used are not "new," in that they come from a well-documented archive, it is
still important to document
results from queries to 3rd party services/data sources. A more detailed and formal way to describe the workflow is to use the W3C PROV standard. [Aneta says: the standard may not be known to people. Do we need the
standards here, or just comment and link to the examples?] Providing web services archive query that
encapsulate produced the
workflow is a good way to reduce data you used, along with all the operations you performed on the
burden of software overhead and dependencies. Organize your data
analysis work after they were retrieved.
Just as in
the way that can be shared with your collaborators. This will make a final sharing Rules 1 through 3, keeping better track of
the workflow. Use tools (electronic notebooks, for example IPython Notebook) to document your analysis process, code workflow, as context, will likely benefit you and
results. your collaborators enough to justify the loftier, more altruistic, goals espoused here.