The hash function \(h(j)\) hence comprehensively describes everything that affects the content of the output files of job \(j\), namely code, parameters, raw input files, software environment and input generated by jobs it depends on. For the latter, we recursively apply the hash function \(h\) again. In other words, for each dependency \(j' \in D_j\) we include a hash value into the hash of job \(j\), which is in fact the hashing principle behind blockchains \cite{narayanan_bitcoin_2016}.
Design patterns
Transformation
Aggregation
Generic conversions
Handling reference data
Resource usage annotation
Streaming data
Grouping jobs
Split-apply-combine
Conditional execution
Discussion