Qiyu Xiao

and 5 more

The Surface Water and Ocean Topography (SWOT) satellite is expected to observe the sea surface height (SSH) down to scales of ∼10-15 kilometers. While SWOT will reveal submesoscale SSH patterns that have never before been observed on global scales, how to extract the corresponding velocity fields and underlying dynamics from this data presents a new challenge. At these soon-to-be-observed scales, geostrophic balance is not sufficiently accurate, and the SSH will contain strong signals from inertial gravity waves — two problems that make estimating surface velocities non-trivial. Here we show that a data-driven approach can be used to estimate the surface flow, particularly the kinematic signatures of smaller scales flows, from SSH observations, and that it performs significantly better than directly using the geostrophic relationship. We use a Convolution Neural Network (CNN) trained on submesoscale-permitting high-resolution simulations to test the possibility of reconstructing surface vorticity, strain, and divergence from snapshots of SSH. By evaluating success using pointwise accuracy and vorticity-strain joint distributions, we show that the CNN works well when inertial gravity wave amplitudes are weak. When the wave amplitudes are strong, the model may produce distorted results; however, an appropriate choice of loss function can help filter waves from the divergence field, making divergence a surprisingly reliable field to reconstruct in this case. We also show that when applying the CNN model to realistic simulations, pretraining a CNN model with simpler simulation data improves the performance and convergence, indicating a possible path forward for estimating real flow statistics with limited observations.
The core tools of science (data, software, and computers) are undergoing a rapid and historic evolution, changing what questions scientists ask and how they find answers. Earth science data are being transformed into new formats optimized for cloud storage that enable rapid analysis of multi-petabyte datasets. Datasets are moving from archive centers to vast cloud data storage, adjacent to massive server farms. Open source cloud-based data science platforms, accessed through a web-browser window, are enabling advanced, collaborative, interdisciplinary science to be performed wherever scientists can connect to the internet. Specialized software and hardware for machine learning and artificial intelligence (AI/ML) are being integrated into data science platforms, making them more accessible to average scientists. Increasing amounts of data and computational power in the cloud are unlocking new approaches for data-driven discovery. For the first time, it is truly feasible for scientists to bring their analysis to data in the cloud without specialized cloud computing knowledge. This shift in paradigm has the potential to lower the threshold for entry, expand the science community, and increase opportunities for collaboration while promoting scientific innovation, transparency, and reproducibility. Yet, we have all witnessed promising new tools which seem harmless and beneficial at the outset become damaging or limiting. What do we need to consider as this new way of doing science is evolving?

Jacob Steinberg

and 3 more

Oceanic mesoscale motions including eddies, meanders, fronts, and filaments comprise a dominant fraction of oceanic kinetic energy and contribute to the redistribution of tracers in the ocean such as heat, salt, and nutrients. This reservoir of mesoscale energy is regulated by the conversion of potential energy and transfers of kinetic energy across spatial scales. Whether and under what circumstances mesoscale turbulence precipitates forward or inverse cascades, and the rates of these cascades, remain difficult to directly observe and quantify despite their impacts on physical and biological processes. Here we use global observations to investigate the seasonality of surface kinetic energy and upper ocean potential energy. We apply spatial filters to along-track satellite measurements of sea surface height to diagnose surface eddy kinetic energy across 60-300 km scales. A geographic and scale dependent seasonal cycle appears throughout much of the mid-latitudes, with eddy kinetic energy at scales less than 60 km peaking 1-4 months before that at 60-300 km scales. Spatial patterns in this lag align with geographic regions where the conversion of potential to kinetic energy are seasonally varying. In mid-latitudes, the conversion rate peaks 0-2 months prior to kinetic energy at scales less than 60 km. The consistent geographic patterns between the seasonality of potential energy conversion and kinetic energy across spatial scale provide observational evidence for the inverse cascade, and demonstrate that some component of it is seasonally modulated. Implications for mesoscale parameterizations and numerical modeling are discussed.

Tom Augspurger

and 3 more

As more analysis-ready datasets are provided on the cloud, we need to consider how researchers access data. To maximize performance and minimize costs, we move the analysis to the data. This notebook demonstrates a Pangeo deployment connected to multiple Dask Gateways to enable analysis, regardless of where the data is stored. Public clouds are partitioned into regions, a geographic location with a cluster of data centers. A dataset like the National Water Model Short-Range Forecast is provided in a single region of some cloud provider (e.g. AWS’s us-east-1). To analyze that dataset efficiently, we do the analysis in the same region as the dataset. That’s especially true for very large datasets. Making local “dark replicas” of the datasets is slow and expensive. In this notebook we demonstrate a few open source tools to compute “close” to cloud data. We use Intake as a data catalog, to discover the datasets we have available and load them as an xarray Dataset. With xarray, we’re able to write the necessary transformations, filtering, and reductions that compose our analysis. To process the large amounts of data in parallel, we use Dask. Behind the scenes, we’ve configured this Pangeo deployment with multiple Dask Gateways, which provide a secure, multi-tenant server for managing Dask clusters. Each Gateway is provisioned with the necessary permissions to access the data. By placing compute (the Dask workers) in the same region as the dataset, we achieve the highest performance: these worker machines are physically close to the machines storing the data and have the highest bandwidth. We minimize cost by avoiding egress costs: fees charged to the data provider when data leaves a cloud region.