Chris Henze - Authorea

We describe Ziggy, an infrastructure for pipelines that process large volumes of science data. Ziggy is based on the pipeline infrastructure software that was developed to process flight data for the Kepler and TESS exoplanet missions. In this latter capacity, multiple terabytes of data are processed every month. Ziggy provides execution control, logging, exception management, marshaling, and persistence, and data accountability record management for user-defined sequences of processing steps. Users define a pipeline via a set of XML files that specify the order in which processing algorithms are applied (including optional branching, in which one step is followed by multiple algorithms that run simultaneously), inputs, outputs, and any instrument models or control parameters that are required for each step. Ziggy supports heterogeneous pipelines: each processing algorithm can be in any supported language, and each step can run locally on a server or remotely on a supercomputer or cloud computing facility. Ziggy is sufficiently lightweight to run on a laptop and sufficiently robust to run on a supercomputer; builds on Mac OS X and Linux are supported. Ziggy is currently in use as the pipeline infrastructure tool for reprocessing the full data volume of the EO-1/Hyperion mission data and is a candidate for use in the upcoming Surface Biology and Geology (SBG) mission of the Earth System Observatory (ESO). Ziggy contains no proprietary or sensitive/controlled software or algorithms, and approval for its release as a NASA Open Source Software Project is underway.