Maria J. Molina

and 2 more

This is a test-case study assessing the ability of deep learning methods to generalize to a future climate (end of 21st century) when trained to classify thunderstorms in model output representative of the present-day climate. A convolutional neural network (CNN) was trained to classify strongly-rotating thunderstorms from a current climate created using the Weather Research and Forecasting (WRF) model at high-resolution, then evaluated against thunderstorms from a future climate, and found to perform with skill and comparatively in both climates. Despite training with labels derived from a threshold value of a severe thunderstorm diagnostic (updraft helicity), which was not used as an input attribute, the CNN learned physical characteristics of organized convection and environments that are not captured by the diagnostic heuristic. Physical features were not prescribed but rather learned from the data, such as the importance of dry air at mid-levels for intense thunderstorm development when low-level moisture is present (i.e., convective available potential energy). Explanation techniques also revealed that thunderstorms classified as strongly rotating are associated with learned rotation signatures. Results show that the creation of synthetic data with ground truth is a viable alternative to human-labeled data and that a CNN is able to generalize a target using learned features that would be difficult to encode due to spatial complexity. Most importantly, results from this study show that deep learning is capable of generalizing to future climate extremes and can exhibit out-of-sample robustness with hyperparameter tuning in certain applications.

John Schreck

and 7 more

Secondary organic aerosols (SOA) are formed from oxidation of hundreds of volatile organic compounds (VOCs) emitted from anthropogenic and natural sources. Accurate predictions of this chemistry are key for air quality and climate studies due to the large contribution of organic aerosols to submicron aerosol mass. Currently, only explicit models, such as the Generator for Explicit Chemistry and Kinetics of Organics in the Atmosphere (GECKO-A), can fully represent the chemical processing of thousands of organic species. However, their extreme computational cost prohibits their use in current chemistry-climate models, which rely on simplified empirical parameterizations to predict SOA concentrations. Recent applications of atmospheric chemistry emulation with machine learning (ML) applied to the simpler chemical mechanisms of tropospheric ozone have shown its ability to produce realistic predictions and significantly reduce the computational cost. This study proves that ML can accurately emulate SOA formation from an explicit chemistry model for several precursors with 100 to 100,000 times speedup over GECKO-A, making it computationally usable in a chemistry-climate model. To train the ML emulator, we generated thousands of GECKO-A box simulations sampled from a broad range of initial environmental conditions, and focused on the chemistry of three representative SOA precursors: the oxidation by OH of two anthropogenic (toluene, dodecane), and one biogenic VOC (alpha-pinene). We compare fully-connected and recurrent neural network methods and use an ensemble approach to quantify their underlying uncertainty and robustness. The SOA predictions generally remain stable over a simulation period of 5 days with an approximate error of 2-8\%.

Dallas Foster

and 2 more

The ocean mixed layer plays an important role in subseasonal climate dynamics because it can exchange large amounts of heat with the atmosphere, and it evolves significantly on subseasonal timescales. Estimation of the subseasonal variability of the ocean mixed layer is therefore important for subseasonal to seasonal prediction and analysis. The increasing coverage of in-situ Argo ocean profile data allows for greater analysis of the aseasonal ocean mixed layer depth (MLD) variability on subseasonal and interannual timescales; however, current sampling rates are not yet sufficient to fully resolve subseasonal MLD variability. Other products, including gridded MLD estimates, require optimal interpolation, a process that often ignores information from other oceanic variables. We demonstrate how satellite observations of sea surface temperature, salinity, and height facilitate MLD estimation in a pilot study of two regions: the mid-latitude southern Indian and the eastern equatorial Pacific Oceans. We construct multiple machine learning architectures to produce weekly 1/2 degree gridded MLD anomaly fields (relative to a monthly climatology) with uncertainty estimates. We test multiple traditional and probabilistic machine learning techniques to compare both accuracy and probabilistic calibration. We find that incorporating sea surface data through a machine learning model improves the performance of MLD estimation over traditional optimal interpolation in terms of both mean prediction error and uncertainty calibration. These preliminary results provide a promising first step to greater understanding of aseasonal MLD phenomena and the relationship between the MLD and sea surface variables. Extensions to this work include global and temporal analyses of MLD.

Andrew Gettelman

and 6 more

Clouds are one of the most critical yet uncertain aspects of weather and climate prediction. The complex nature of sub-grid scale cloud processes makes traceable simulation of clouds across scales difficult (or impossible). Often models and measurements are used to develop empirical relationships for large-scale models to be computationally efficient. Machine learning provides another potential tool to improve our empirical parameterizations of clouds. To explore these opportunities, we replace the warm rain formation process in a General Circulation Model (GCM) with a detailed treatment from a bin microphysical model that causes a 400\% slowdown in the GCM. We analyze the changes in climate that result from the use of the bin microphysical calculation and find improvements in the rain onset and frequency of light rain compared to detailed models and observations. We also find a resulting change in the cloud feedback response of the model to warming, which will significantly impact the climate sensitivity. We then emulate this process with an emulator consisting of multiple neural networks that predict whether specific tendencies will be nonzero and the magnitude of the nonzero tendencies. We describe the risks of over-fitting, extrapolation, and linearization of a non-linear problem by using perfect model experiments with and without the emulator and show we can recover the solutions with the emulators in almost all respects, and recover nearly all the speed to get simulations that perform as the detailed model, but with the computational cost of the control simulation.