Jon Jenkins

and 11 more

The Surface Biology and Geology (SBG) mission is one of the core missions of NASA’s Earth System Observatory (ESO). SBG will acquire high resolution solar-reflected spectroscopy and thermal infrared observations at a data rate of ~10 TB/day and generate products at ~75 TB/day. As the per-day volume is greater than NASA’s total extant airborne hyperspectral data collection, collecting, processing/re- processing, disseminating, and exploiting the SBG data presents new challenges. To address these challenges, we are developing a prototype science pipeline and a full-volume global hyperspectral synthetic data set to help prepare for SBG’s flight. Our science pipeline is based on the science processing operations technology developed for the Kepler and TESS planet-hunting missions. The pipeline infrastructure, Ziggy, provides a scalable architecture for robust, repeatable, and replicable science and application products that can be run on a range of systems from a laptop to the cloud or an on-site supercomputer. Our effort began by ingesting data and applying workflows from the EO- 1/Hyperion 17-year mission archive that provides globally sampled visible through shortwave infrared spectra that are representative of SBG data types and volumes. We have fully implemented the first stage of processing, from the raw data (Level 0) to top-of-the-atmosphere radiance (Level 1R). We plan to begin reprocessing the entire 55 TB Hyperion data set by the end of 2021. Work to implement an atmospheric correction module to convert the L1R data to surface reflectance (Level 2) is also underway. Additionally, an effort to develop a hybrid High Performance Computing (HPC)/cloud processing framework has been started to help optimize the cost, processing throughput and overall system resiliency for SBG’s science data system (SDS). Separately, we have developed a method for generating full-volume synthetic data sets for SBG based on MODIS data and have made the first version of this data set available to the community on the data portal of NASA’s Advanced Supercomputing Division at NASA Ames Research Center. The synthetic data will make it possible to test parts of the pipeline infrastructure and other software to be applied for product generation.

Peter Tenenbaum

and 8 more

We describe Ziggy, an infrastructure for pipelines that process large volumes of science data. Ziggy is based on the pipeline infrastructure software that was developed to process flight data for the Kepler and TESS exoplanet missions. In this latter capacity, multiple terabytes of data are processed every month. Ziggy provides execution control, logging, exception management, marshaling, and persistence, and data accountability record management for user-defined sequences of processing steps. Users define a pipeline via a set of XML files that specify the order in which processing algorithms are applied (including optional branching, in which one step is followed by multiple algorithms that run simultaneously), inputs, outputs, and any instrument models or control parameters that are required for each step. Ziggy supports heterogeneous pipelines: each processing algorithm can be in any supported language, and each step can run locally on a server or remotely on a supercomputer or cloud computing facility. Ziggy is sufficiently lightweight to run on a laptop and sufficiently robust to run on a supercomputer; builds on Mac OS X and Linux are supported. Ziggy is currently in use as the pipeline infrastructure tool for reprocessing the full data volume of the EO-1/Hyperion mission data and is a candidate for use in the upcoming Surface Biology and Geology (SBG) mission of the Earth System Observatory (ESO). Ziggy contains no proprietary or sensitive/controlled software or algorithms, and approval for its release as a NASA Open Source Software Project is underway.