Oliver Dunbar

and 3 more

Targeted high-resolution simulations driven by a general circulation model (GCM) can be used to calibrate GCM parameterizations of processes that are globally unresolvable but can be resolved in limited-area simulations. This raises the question of where to place high-resolution simulations to be maximally informative about the uncertain parameterizations in the global model. Here we construct an ensemble-based parallel algorithm to locate regions that maximize the uncertainty reduction, or information gain, in the uncertainty quantification of GCM parameters with regional data. The algorithm is based on a Bayesian framework that exploits a quantified posterior distribution on GCM parameters as a measure of uncertainty. The algorithm is embedded in the recently developed calibrate-emulate-sample (CES) framework, which performs efficient model calibration and uncertainty quantification with only O(10^2) forward model evaluations, compared with O(10^5) forward model evaluations typically needed for traditional approaches to Bayesian calibration. We demonstrate the algorithm with an idealized GCM, with which we generate surrogates of high-resolution data. In this setting, we calibrate parameters and quantify uncertainties in a quasi-equilibrium convection scheme. We consider (i) localization in space for a statistically stationary problem, and (ii) localization in space and time for a seasonally varying problem. In these proof-of-concept applications, the calculated information gain reflects the reduction in parametric uncertainty obtained from Bayesian inference when harnessing a targeted sample of data. The largest information gain results from regions near the intertropical convergence zone (ITCZ) and indeed the algorithm automatically targets these regions for data collection.

Kyongsik Yun

and 9 more

California’s Central Valley is responsible for $17 billion of annual agricultural output, producing 1/4 of the nation’s food. However, land in the Central Valley is sinking at a rapid rate (as much as 20 cm per year) due to continued groundwater pumping. Land subsidence has a significant impact on infrastructure resilience and groundwater sustainability. It is important to understand subsidence and groundwater depletion in a consistent framework using improved models capable of simulating in-situ well observations and observed subsidence. Currently, groundwater well data is sparse and sampled irregularly, compromising our understanding of groundwater changes. Moreover, groundwater pumping data is a major missing piece of the puzzle. Limited data availability and spatial/temporal uncertainty in the available data have hampered understanding the complex dynamics of groundwater and subsidence. To address this limitation, we first integrated multimodal data including InSAR, groundwater, precipitation, and soil composition by interpolating data with the same spatial and temporal resolutions. We then identified regions with different temporal dynamics of land displacement, groundwater depth, and precipitation. Some areas (e.g., Helm) with coarser grain soil compositions exhibited potentially reversible land transformations (elastic land compaction). Finally, we fed the integrated data into the deep neural network of a gated recurrent unit-based sequence-to-sequence generation model. We found that the combination of InSAR, groundwater depth, and precipitation data had predictive power for soil composition using deep neural networks (correlation coefficient R=0.83, normalized Nash-Sutcliffe model efficiency NNSE=0.84). A random forest model was tested as baseline (R=0.65, NNSE=0.69). We also achieved significant accuracy with only 40% of the training data (NNSE=0.8), suggesting that the model can be generalized to other regions for indirect estimation of soil composition. Our results indicate that soil composition can be estimated using InSAR, groundwater depth and precipitation data. In-situ measurements of soil composition can be expensive and time consuming and may be impractical in some areas. The generalizability of the model sheds light on high spatial resolution soil composition estimation utilizing existing measurements.

Oliver Dunbar

and 3 more

Parameters in climate models are usually calibrated manually, exploiting only small subsets of the available data. This precludes an optimal calibration and quantification of uncertainties. Traditional Bayesian calibration methods that allow uncertainty quantification are too expensive for climate models; they are also not robust in the presence of internal climate variability. For example, Markov chain Monte Carlo (MCMC) methods typically require $O(10^5)$ model runs, rendering them infeasible for climate models. Here we demonstrate an approach to model calibration and uncertainty quantification that requires only $O(10^2)$ model runs and can accommodate internal climate variability. The approach consists of three stages: (i) a calibration stage uses variants of ensemble Kalman inversion to calibrate a model by minimizing mismatches between model and data statistics; (ii) an emulation stage emulates the parameter-to-data map with Gaussian processes (GP), using the model runs in the calibration stage for training; (iii) a sampling stage approximates the Bayesian posterior distributions by using the GP emulator and then samples using MCMC. We demonstrate the feasibility and computational efficiency of this calibrate-emulate-sample (CES) approach in a perfect-model setting. Using an idealized general circulation model, we estimate parameters in a simple convection scheme from data surrogates generated with the model. The CES approach generates probability distributions of the parameters that are good approximations of the Bayesian posteriors, at a fraction of the computational cost usually required to obtain them. Sampling from this approximate posterior allows the generation of climate predictions with quantified parametric uncertainties.