The Effective Catchment Index

To obtain the effective catchment area, we used the effective catchment index (ECI) proposed by Liu et al. (2020). The ECI indicates the occurrence and the strength of inter-catchment groundwater flow (IGF), improving previous studies (e.g. Schaller and Fan (2009)) regarding water gain or loss by catchments. The ECI describes the deviation of the effective catchment area from the topographic area counting for IGF in the water balance. The assumption of a closed water balance without IGF in topographic catchments may lead to a misunderstanding of hydrological data (Liu et al. , 2020). The index is calculated as follows:
\(ECI=log\ \left[\frac{Q}{\left(P-ET\right)}\right]\)(1)
where P (mm d-1), ET (mm d-1), and Q (mm d-1) are the long-term estimates of precipitation, actual evapotranspiration, and streamflow at the catchment outlet, respectively. Catchments have their effective area larger than their topographic area when ECI > 0 (gaining water condition). On the other hand, catchments with an effective area smaller than the topographic area present ECI < 0 (losing water condition).
If any IGF occurrence affects the observed water flow (Q) at each catchment outlet, direct response of Q to the effective catchment area (Aeff) is established whereas the response to the topographic catchment area (Atopo) is derived from the difference between the catchment precipitation (P) and actual evapotranspiration (ET) (Liu et al. , 2020). Therefore, the ratio of the effective to the topographic catchment area is defined as:
\(\frac{A_{\text{eff}}}{A_{\text{topo}}}=\frac{Q}{\left(P-ET\right)}\)(2)
The effective and topographic area is related to ECI by combining the previous equations, as shown in the equation below (Equation 3). We adopted the same definition of substantial deviation between the topographic and effective areas as (1) effective area larger than double or (2) smaller or losing water (ECI > -0.15) conditions. ECI values that fall between these extreme ranges indicate either a small gain or a small loss condition.
\(\frac{A_{\text{eff}}}{A_{\text{topo}}}=10^{\text{ECI}}\) (3)
We evaluated the ECI estimates by contrasting our results against the expected range of the Budyko curve (Budyko, 1974) considering both topographic and effective catchment areas. We adopted the Fu curve (Fu, 1981), which is widely used and represents the Budyko curve with w = 2.6 (Beck et al. , 2020). For evaluation and discussion, we define the Budyko framework considering the topographic area as a classic framework whilst the one considering the effective area as an adjusted framework. For both, we plotted the relationship between the long-term aridity index — PET/P — and the long-term evaporative index — (P-Q)/P. We have used P and PET from the climatology of the CABra dataset (calculated over the 1980-2010 period). As described in Almagro et al. (2020), P is derived from a reference dataset obtained from ~4,000 rain gauges that cover the Brazilian area. Nonetheless, as there are catchments with area beyond borders of the Brazilian territory, Almagroet al. (2020) developed an ensemble dataset (reference + ERA5) that we used in the present study. The PET, in turn, is also a climatology from daily estimation by the Priestley and Taylor method (see Almagro et al., 2020). Finally, the Q is only based on streamflow gauge observations over the Brazilian catchments. We also expected to diminish the uncertainties by not adopting another different dataset for the AET calculation. Considering the water and energy limits in the Budyko framework (Bouaziz et al. , 2018; Liu et al. , 2020), a catchment with Q > P gains water (ECI > 0) whereas those with P-Q > PET lose water (ECI < 0).

Influence of Catchment Attributes on ECI

To identify relevant catchment attributes and hydrological signatures explaining the variability of ECI, we used a combination of Principal Component and Random Forest analyses (PCA and RFA, respectively) (Figure 2). The PCA is a dimension reduction technique and was used to evaluate which of the 15 attributes from the CABra dataset (available in S1) are responsible for the most variation in the ECI results. This first step allowed us to remove features that do not hold any predictive value, dealing with the overfitting problem on the classification of decision trees. A random forest is an ensemble of decision trees (Denisko & Hoffman, 2018), in which a single decision tree can exhibit high variance and overfit, but a random forest can reduce the variance by combining several trees.