Data Choices - Considerations for Uncertainty

AbstractThere is an increasing awareness of the importance of considering multiple sources of forcing data for down-scaling and impact modeling applications, given inherent uncertainties in climate projections and local sensitivities to large scale drivers. Due to the limited number and accessibility of large scale climate simulations, selecting such forcing data is typically an exercise in the random sampling of possible outcomes. This selection, however, has real implications in how the range of potential events and their associated uncertainties are perceived in both scientific and decision making contexts. We present a simple illustration of this situation, highlighting the place a given singular study inherently sits within the context of our larger, potentially unarticulated, understanding.

Setting the stage

\label{opening} In the following we hope to motivate a discussion concerning how patterns of practice contribute to perceived uncertainty for decision makers. While any decision making process involves uncertainty, in the context of planning wrt climate impacts a chief concern is: “how well do we describe the range of risks we must consider responding to?” Identifying, quantifying, and reducing these uncertainties is a topic of much ongoing work and discussion; e.g,(Kalognomou 2013), (Goldstein 2013) (Gaganis 2008) (Stainforth 2007). Much of this discussion, however, relates to the domain of experimental design and model development; i.e., the creation of the simulation ensembles, and so are out of the control of the data user who draws on on the results to study potential impacts. Typically an impact modeller is not able to access the entirety of simulations that have been produced, nor has the computational resources to run their impacts model for all possible forcing in the first place. As such, the selection of forcing data is the main area where the impacts modeller has direct input into the representation of forcing uncertainties.

At this scale of implementation; i.e., selecting a few simulations to represent possible inputs for an impact model, there is a tension between a more formal understanding of uncertainty and the desire to impart a sense of what is likely and to communicate where there is and isn’t confidence. Large multi-model and/or calibrated ensembles attempt an [arguably quite limited (Knutti 2010) (Doherty 2010)] expression of such uncertainties. The sub-selection of potential forcings serves rather to acknowledge the indeterminate nature of these external conditions without quantifying their full potential scope. It is widely understood that different simulations represent a continuum of ’variations on a theme’ (Masson 2011), rather than distinct options where either A, B, or C is the ’correct’ choice. As such, an intuitive response is to look for a representation of the general expectations expressed by these simulations, such as an ensemble mean. For many applications, however, especially those determined by chronologies of events, this is not a viable approach. In hydrology duration and intensity of rainfall are key variables, and these sequences and extremes are lost in the process of model averaging. This leaves little option except choosing and applying what is hopefully a representative sample of simulation realisations. The concern is that this selection does make implicit assertions of confidence, even if it is by necessity undertaken in a haphazard way; e.g., if the selected subset of simulations happen to produce similar output for a given region, this appears at face value to imply this is a likely outcome.

These dilemmas are unavoidable under current practise. The resolution needs of hydrological modellers are often addressed using simple bias correction of GCM data, possibly spatially disaggregated, yet all still predicated on the original GCM grid cell data. There is much discussion of model selection in the literature … [insert discussion here, if it actually exists] … but given the inherent computational and scientific limitations in our ability to map the space of all possible climates and event chronologies, even the most critical evaluations are still performed on a sub-sample of a sub-sample. This implies that the considerations addressed here will be pertinent even as ensemble design and data access continue to improve.

Outlining the (simplistic) approach of permutations

\label{methods} What sort of landscape is created by varied groups and agencies creating sub-ensembles of different sizes determined by their resources and needs? How much variation is there in the perceived messages of the incorporated forcing data? Here we create a simple illustration1. We take estimates of historical (1986-2005) climatological precipitation from \(\mathrm{n} = 7\) CMIP5 (Taylor 2012) General Ciruclation Models (GCMs)2, see Table 1, for grid cells containing Johannesburg, South Africa. We then consider every possible simulation combination for ensembles from size \(\mathrm{k} = 1\) to \(7\). This gives \(\mathrm{n} \choose \mathrm{k}\) sub-ensembles for each group size. We then consider the median, as well as average absolute deviation about the median3, for each sub-ensemble. This allows us to visualise how potentially different are the ensembles various groups are working with, and ho