3. Results
The final dataset consisted of 1,801 individual sequences across 526 localities (Figure 1A). A total of 6,350 pairwise values of DXY were derived from this dataset. DXYvalues ranged from 0 to 56.33 (mean = 9.685; median = 4.5, see Figure S1 in Supporting File 1 for the range of DXY values per locus for each species). When the observed DXY values are plotted in space (Figure 1B), it becomes evident that genetic breaks (represented by midpoints between localities with relatively high values of DXY) accumulate around three regions of the Atlantic Forest: 1) lowland valleys within the Serra do Mar mountain range and Paraíba do Sul river, in the southern range of the forest; 2) the Doce river and nearby regions; 3) northern regions near the São Francisco river.
Global models including only environmental predictors performed worse on average than models that included both environmental predictors and ecological traits (Figure 2A). Models based solely on environmental predictors had mean R2 = 0.14 (ranging from 0.0007 to 0.45), whereas those included environmental and dispersal data had mean R2 = 0.53 (ranging from 0.04 to 0.81) and those that included environment and demographic data had mean R2= 0.43 (ranging from 0.003 to 0.77). Finally, models including environmental data and both types of ecological traits (i.e., dispersal and demographic traits) had mean R2 = 0.54 (ranging from 0.06 to 0.81). A Kruskal-Wallis test suggests that the distribution of R2 differs among all four sets of predictors (X2 = 170.81, p -value < 0.01) and Wilcoxon tests suggest that all models that including traits have consistently higher predictive accuracy than models based solely on environmental data (p -value < 0.001 for each set of predictors including ecological traits). In addition, the inclusion of dispersal traits led to a higher increase in R2 values (when compared to models based solely on environmental data) than the inclusion of demographic traits (Figure 2B).
Correlation indexes across predictor variables revealed that geographic, topographic and bioclimatic resistance distances were highly correlated (Table S2). Additionally, body size was highly correlated with wing length (ρ = 0.873) and adult survival (ρ = 0.872). Environmental distances, represented mainly by temperature seasonality and precipitation of coldest quarter, consistently had the highest impact in model accuracy (Figure 3). Morphological traits, represented mainly by wing length, were equally important whenever they were included. Adult survival and longevity were important ecological traits in models based solely on environmental data and demographic traits, but were surpassed by environmental data and dispersal traits whenever those were also present. Finally, the mtDNA locus used to calculate DXYvalues was always present among the five most important variables across all models.
Species-specific predictions show a larger variation in R2 within each set of predictors (values ranging from 0.0001 to 0.9; Figure 4). However, models including ecological traits tend to have higher mean R2 (Table 2; Figure 5A). A Kruskal-Wallis test moderately supports that the distribution of R2 differs among all four sets of predictors (X2 = 9.53, p -value = 0.02). Similar to global models, Wilcoxon tests of R2 values for species-specific models suggest that all models including traits have consistently higher predictive accuracy than models based solely on environmental data (p -value < 0.001 for each set of predictors including ecological traits). When considering only the model with highest predictive power for each combination of species and locus, it becomes clear that models including only environmental data tend to have low predictive power (R2 < 0.17) even when they are the best model across the four sets of predictors (Figure 5B). An exception to this pattern is the Cytb dataset for speciesSclerurus scansor , where the model based solely on environmental data simultaneously was the best model and showed high accuracy (R2 = 0.71; Figure 4). Finally, similar to global models, the inclusion of dispersal traits led to a higher increase in R2 values (when compared to models based solely on environmental data) than the inclusion of demographic traits (Figure S2).
Maps of the interpolated values of predicted DXY reveal that, although models generally agree with maps of observed values (Figure 6A), model uncertainty is higher in the northern Atlantic Forest (hereinafter, northern AF), especially in models based solely on environmental data (Figure 6B). Additionally, models tend to overpredict genetic differentiation in northern AF (i.e., above the Doce River) and underpredict differentiation in the southern Atlantic Forest (hereinafter, southern AF; Figure 6C). Both over and underprediciton decreases when ecological traits are added.