Discussion

\label{sec-3} The analyses presented above raises a few points. On the positive side we see that several models evaluate reasonably; i.e., perform notably better than climatology, for the greater Volta basin. This suggests these RCMs could potentially be used to generate estimates of interannual variability over the entire basin under different boundary conditions. Looking over the full evaluation period (1989-2008), the years with greater than typical ensemble agreement are evenly divided between wet and dry. While this observation has limited meaning, prima facie it might suggest that simulation skill is not biased towards either extreme. This gives some suggestion that our initial goal of using RCM simulations to estimate the risk of multi-year dry events is not unattainable. Provided, however, the ‘break points’ noted in our analysis can be addressed.

  1. Disagreement between observational products: Inconsistency between simulations can be a positive attribute. This is particularly so in the absence of strong boundary forcing, i.e. when there are no known larger scale forcings such as El Niño at play. When there is nothing to constrain potentiality, a well dispersed ensemble is an accurate representation of the situation. This is not the case for observational products. Where observations disagree and/or, as in this case, are known to be unreliable, there is often no clear way to express what the potential range of ‘real’ states might be. If this range was already known, most likely that means that an adequate ‘ground truthing’ is already available for a discerning analysis of the observational products. The limited availablity of data records for much of the contenient has been discussed many times \citep{nyenzi_assessment_2001}. There are many proposed shortcuts, but in most situations, expanding observational networks is the only dependable solution. In populated areas, this can usually be accomplished simply by improving reporting systems and data access. Alternately, it becomes necessary to use a priori information and proxies, as we have above. Our use of dam water level as ‘ground truthing’ raises a question inherent in any use of proxy data for evaluation or inverse modeling: is a signal that is seemingly providing information better correlated to the task at hand necessarily the more accurate? To make this assumption, as we have here, requires confidence in our understanding of underlying mechanisms and meta-data. The obvious question is: why are the reported Akosombo levels being considered the benchmark values? The honest answer is that Ghanaians who depend on these water levels for consistent access to electricity regularly monitor, share, and contest this information in real time, making it the most scrutinised of our data sources (even if this scrutiny happens mostly on Facebook). Here again, improved reporting systems and open data access can provide large benefits, as can small scale process modelling on the local level.

  2. Inadequate length of records: Unfortunately, the surest solution to short observational records, is time. As above, dam level records, used as a proxy for moisture availability and rainfall, and process-modelling/data-assimilation can be used to back fill records, but it is essential to understand the scales at which these estimates are no longer representative. Given that all observational systems have life-spans (planned or unplanned) a key issue is data inhomogeneity \citep{peterson_homogeneity_1998}. Creating overlapping records so that new networks can be used to extend existing data series should be a priority \citep{aguilar_guidelines_2003}. For the present, implementation and development of methods that express the sampling uncertainty induced by limited records is essential for honest communication of results. Bootstrapping \citep{wilks_statistical_2011} and/or Bayesian \citep{jaynes_probability_2003} approaches can be used towards this end.

  3. Relevant phenomena below the credible resolution of simulation: The Oti basin appears to be too small to be consistently simulated, given the spatial unpredictably of rainfall. Since this is a major input to Lake Volta, the RCMs may by of limited use for studying this impact. Even if long term forecasts for the greater basin (as illustrated in Figure \ref{multidryyearprobs}) could be considered reliable, their relation to the Akosombo would potentially still be indirect. Figure \ref{damsatellitescorr}, however, suggests that aggregated full basin rainfall describes much of the variability in water levels for Lake Volta. Further process; i.e., hydrological modelling is needed to clarify the critical resolution needed to inform Akosombo dam management. As well, when the simulations are being forced with the same contemporary reanalysis, the fact that the RCMs may still provide different realisations of what happens inside the grid cells does not necessarily represent a failure to translate boundary conditions. Possibly the ensemble expresses the limited extent to which boundary conditions determine the distribution of precipitation within the RCM domain. The variations of RCM setup and parametrizations may represent different mechanisms which may or may not dominate year to year, event to event. As well, the boundary conditions, being taken from reanalysis, are themselves inexact, and the permutations of these that different groups would use as model forcing imply the RCMs are technically being run under different sets of initial conditions. It would be convenient to believe that over the spatial and temporal scales being considered these factors would have a limited effect, but this isn’t guaranteed. Recall that even the observational record for the area is far from consistent, although perhaps those differences are easier to directly interpret. We require more rigorous estimates of spatial/temporal scales of potential predictability when describing the earth system. This will provide more realistic benchmarks for model development and evaluation as well as help quantify the degree of general resilience needed in adaptation strategies.

  4. Unclear response to large scale forcing: As noted, the performance of most simulations are overall better than climatology. However, lack of a consistent response to events such as the widespread 2008 flooding cause scepticism in their ability to respond to key drivers. Mostly, this concern highlights limitations in observational record length and understanding of relevant mechanisms of monsoon magnitude over western Africa. Were the 2008 floods a response to a clear meteorological signal, or fluke events? Before evaluating simulations, it is necessary to understand what we expect them to be doing. Alternately, we must understand the workings of the ‘model worlds’ we create. The ensemble agreement for 1989-2008, appears to be decoupled from large scale drivers that we would expect to increase predictability. There are exceptions, and again, this observation is heavily undermined by the short evaluation period. However, it does raise the possibility that the models are responding to different processes than we might expect. This could be a result of an incomplete understanding of the regional system, or that the dominant processes within the model space are different from the ‘true’ system, and allegories must be found between the two. This could be clarified by applying an analysis similar to that referred to under point 3 above, comparing how the observational products and the simulation data responses to large scale drivers differ.

  5. Difficulty separating trends from extremes: It is not clear whether the prolonged dry spell that produces the increased frequency of recurring dry years in the 2080 to 2100 time period of the RCP8.5 simulation show in Figure \ref{multidryyearprobs} is the result of a shifting behavioural regime, or a hazardous, low probability event. In part this ambiguity is a result of the articulated sample size uncertainties. It also points to a common dilemma when designing climate model experiments; multi-century simulations extend past any period of‘applied relevance’, however, they (as well as initial condition sensitivity experiments) are required in order to interpret the ‘near-future’ results. It is also important to note that we have not directly evaluated here how well the RCMs simulate transition frequencies. The selected comparison time period (eight years) is so brief that sampling uncertainty would mask any signal. Instead, the direct comparison of observed to simulated sequences gives more confidence that the chronologies, and therefor transitions, produced by the dynamical models1, are a result of relevant mechanisms.

  6. Perfect prognosis assumption: Currently we are trying to determine if the inherent predictability of the system and the RCM response to boundary conditions is adequate to make inferences over the Volta basin and sub-regions. We have not addressed how to understand the behaviour of the RCM when responding to simulated global circulation, but rather how the RCMs respond to boundary conditions forced by reanalysis data. This is an appropriate first step, as we want to establish that the RCMs will give a sharper, rather than more distorted, view of the region. Even should this be established, i.e. if the RCMs did a good job a simulating the region, Figure \ref{multidryyearprobs} is incomplete without considering GCM induced boundary condition misrepresentations, and how they affect confidence in resulting regional statistics. This consideration can potentially only be properly addressed when we have improved our understanding of what in the external forcing the RCMs are responding to.

  7. Lack of RCM output: At a few points in the analysis we comment that further investigation could be done with wider access to simulation outputs. Highlighted in this study was the desire for 700mb fields (multiple variables, particulary wind) and indicators of moisture levels through out the vertical atmospheric column. This could be a consideration for modelling groups when they are making decisions on what data to save/distribute.


  1. A probabilistic weather generator or similar model could be evaluated by different (more statistically direct) criteria.