The hash function \(h(j)\) hence comprehensively describes everything that affects the content of the output files of job \(j\), namely code, parameters, raw input files, software environment and input generated by jobs it depends on. For the latter, we recursively apply the hash function \(h\) again. In other words, for each dependency \(j' \in D_j\) we include a hash value into the hash of job \(j\), which is in fact the hashing principle behind blockchains \cite{narayanan_bitcoin_2016}.

Design patterns

Transformation

Aggregation

Generic conversions

Handling reference data

Resource usage annotation

Streaming data

Grouping jobs

Split-apply-combine

Conditional execution

Discussion