Jon Jenkins - Authorea

The Surface Biology and Geology (SBG) mission is one of the core missions of NASA’s Earth System Observatory (ESO). SBG will acquire high resolution solar-reflected spectroscopy and thermal infrared observations at a data rate of ~10 TB/day and generate products at ~75 TB/day. As the per-day volume is greater than NASA’s total extant airborne hyperspectral data collection, collecting, processing/re- processing, disseminating, and exploiting the SBG data presents new challenges. To address these challenges, we are developing a prototype science pipeline and a full-volume global hyperspectral synthetic data set to help prepare for SBG’s flight. Our science pipeline is based on the science processing operations technology developed for the Kepler and TESS planet-hunting missions. The pipeline infrastructure, Ziggy, provides a scalable architecture for robust, repeatable, and replicable science and application products that can be run on a range of systems from a laptop to the cloud or an on-site supercomputer. Our effort began by ingesting data and applying workflows from the EO- 1/Hyperion 17-year mission archive that provides globally sampled visible through shortwave infrared spectra that are representative of SBG data types and volumes. We have fully implemented the first stage of processing, from the raw data (Level 0) to top-of-the-atmosphere radiance (Level 1R). We plan to begin reprocessing the entire 55 TB Hyperion data set by the end of 2021. Work to implement an atmospheric correction module to convert the L1R data to surface reflectance (Level 2) is also underway. Additionally, an effort to develop a hybrid High Performance Computing (HPC)/cloud processing framework has been started to help optimize the cost, processing throughput and overall system resiliency for SBG’s science data system (SDS). Separately, we have developed a method for generating full-volume synthetic data sets for SBG based on MODIS data and have made the first version of this data set available to the community on the data portal of NASA’s Advanced Supercomputing Division at NASA Ames Research Center. The synthetic data will make it possible to test parts of the pipeline infrastructure and other software to be applied for product generation.