Benchmarking of machine learning ocean subgrid parameterizations in an
idealized model
Abstract
Recently, a growing number of studies have used machine learning (ML)
models to parameterize computationally intensive subgrid-scale processes
in ocean models. Such studies typically train ML models with filtered
and coarse-grained high-resolution data and evaluate their predictive
performance offline, before implementing them in a coarse resolution
model and assessing their online performance. In this work, we provide a
framework for systematically benchmarking the online performance of such
models, their generalization to domains not encountered during training,
and their sensitivity to dataset design choices. We apply this proposed
framework to compare a large number of physical and neural network
(NN)-based parameterizations. We find that the choice of filtering and
coarse-graining operator is particularly critical and this choice should
be guided by the application. We also show that our all of our
physics-constrained NNs are stable when implemented online, but that
performance across metrics can vary drastically. In addition, to test
generalization and help with interpretability of data-driven
parameterizations, we propose a novel equation-discovery approach
combining linear regression and genetic programming with spatial
derivatives. We find this approach performs on par with neural networks
on the training domain but generalizes better beyond it. We release code
and data to reproduce our results and provide the research community
with easy-to-use resources to develop and evaluate additional
parameterizations.