After performing the PCA, we selected attributes that showed the highest variance among all principal components as input for RFA, totalizing 12 attributes: aridity index, precipitation seasonality, water table depth (WTD), height above the nearest drainage (HAND), reservoir area, hydrological disturbance index, streamflow elasticity, porosity, permeability, hydraulic conductivity, mean elevation, and mean slope. The precipitation seasonality indicates the timing between the precipitation seasonal cycle and the temperature seasonal cycle. Values of this attribute close to +1 indicate the occurrence of summer precipitation while values close to -1 indicate winter precipitation (Almagro et al., 2020). Additionally, we added the Brazilian biomes — Amazon, Cerrado, Caatinga, Atlantic Forest, Pantanal, and Pampa — and soil texture — clay, clay loam, loam, sandy clay, sandy loam, and sandy clay-loam — as categorical variables to the analysis by using the One-Hot encoding method (Pedregosa et al. , 2011). This method converted these variables into numerical ones by treating them with equal order.
We applied the classifier and regressor classes of the Random Forest algorithm (Pedregosa et al. , 2011) to a total of 24 attributes. The classifier class correlated the 12 attributes to ECI values by the majority vote across the decision trees while the regressor considered the average correlation in the ensemble of the decision trees. We also applied 10-fold cross-validation and tested different hyper-parameters, such as numbers of ensembles and the maximum depth of the trees to control the quality of the forest. All analyses were carried out by using a Python script available at http://doi.org/10.5281/zenodo.4247710.

RESULTS

The effective area of about 16% of the studied catchments was larger than double (dark blue circles on the coast) of their corresponding topographic areas. On the other hand, 13% of the effective catchment areas were smaller than half (dark red circles in the northeast) of their topographic areas (Figure 3, the histogram is available in S2). A clear pattern was noted in Caatinga, Cerrado, and Atlantic Forest although we did not observe a clear tendency of an ECI sign in the Amazon, Pampa, and Pantanal biomes. In the Caatinga (predominantly semiarid region) and the Cerrado biomes, our analysis demonstrated that catchments have their effective area smaller than the topographic area whilst most catchments presented the effective area larger than the topographic area in the Atlantic Forest biome.