1. Introduction
Documenting and predicting spatial patterns of biodiversity at different scales remains the major goal of biogeography and a crucial step in proposing appropriate conservation policies (Cadotte and Tucker 2018, Burbano-Girón et al. 2022). Efforts to map and conserve intraspecific genetic diversity, in particular, are relevant whenever one’s goal is to preserve evolutionary history (Tucker et al. 2019), maintain population connectivity (Schoville et al. 2018, Bracco et al. 2019) and ensure adaptation potential in the face of future environmental changes (Hoelzel et al. 2019). Identifying regions of lineage turnover within the spatial range of species is therefore important when delimiting conservation areas that account for cryptic genetic diversity (Crandall et al. 2000, D’Amen et al. 2013). However, performing such task for multiple species at a time requires extensive field work, time and resources - especially in megadiverse communities such as those in highly threatened tropical systems. Approaches that predict the distribution of genetic diversity, without the need of additional intensive fieldwork, are therefore desirable and may contribute novel insights to inform conservation (Manel and Holderegger 2013, Pollock et al. 2020, Green et al. 2022). For instance, predictive models of intraspecific genetic differentiation can be useful in conservation biology by summarizing genetic patterns within multiple co-distributed species in a region and by providing a map of genetic barriers within a community. Moreover, if these models are both reliable and transferable across a group of species, one may be able to use them to predict the location of genetic breaks in species for which environmental and ecological information are available, but genetic data are scarce or inexistent. This would be the case of models built from community-level environmental, ecological, and genetic data, yet devoted to species-specific predictions for target, endangered, or data-deficient taxa for which molecular data are not widely available. With phylogeographic surveys increasing in numbers across the world (Hickerson et al. 2010), and community-level datasets becoming more common, these exercises are now possible.
Contributing toward this goal, evolutionary biologists have been generating models that predict levels of genetic diversity and connectivity between populations from environmental information, especially topographic and climatic gradients across space and time (van Strien et al. 2014, Brown et al. 2016, Espíndola et al. 2016). This is based on widespread observations that both climate and geography are highly correlated with the spatial distribution of genetic diversity (Carstens and Richards 2007, Carnaval et al. 2014, Cabanne et al. 2016). Not surprisingly, it has been shown that levels of genetic differentiation within a species are also impacted by ecological traits, especially those characteristics thought to correlate with dispersal capacity, such as morphological attributes (e.g., body size; Pabijan et al. 2012), reproductive strategies (Paz et al. 2015), habitat occupancy (Burney and Brumfield 2009) and foraging ecology (Miller et al. 2021). These elements of the ecology of species may be especially important for conservation genetics by helping us understand whether, how and why the patterns of genetic differentiation within one specific taxon may differ from the more common (or general) pattern detected in the overall community (Fortuna et al. 2009, Porto et al. 2013).
Less explored, however, are the roles of demographic traits such as fecundity, mortality and generation length, especially at large spatial scales. Such traits have been hypothesized to indirectly affect dispersal capacity through their effect on the number of individuals in a population, the number of offspring per generation, the local extinction rate and the frequency of reproductive cycles, all of which can influence the probability of dispersal events (Perry et al. 2005, Stevens et al. 2013, Castorani et al. 2017, Bonte and Dahirel 2017, Weil et al. 2022). For instance, dispersal distances were shown to be correlated with fast life-history strategies (i.e., high fecundity and low survival rates) in plants (Beckman et al. 2018). Additionally, it has been hypothesized that species with shorter generation length have more dispersal opportunities per time unit, which has been supported in butterflies (Stevens et al. 2012). However, tests of these relationships using genetic data are still sparse, mostly because demographic traits are costly to estimate for many taxa. Additionally, the question remains regarding to what extent these demographic traits are correlated with morphological and foraging ecology traits (e.g., body size), and therefore whether they would be informative in predictive models of genetic differentiation.
Here, we create a machine learning model that draws from landscape genetics and predictive phylogeography (Espíndola et al. 2016, Pelletier and Carstens 2018, Sullivan et al. 2019) to predict the magnitude of genetic differentiation among populations, using environmental descriptors and both dispersal-related and demographic traits within multiple co-distributed bird taxa in the Atlantic forest of Brazil. Machine learning techniques have been showing great promise in population genetics as a tool to leverage available data to understand and predict geographic patterns (Schrider and Kern 2018). We used such approach to combine published mtDNA data with available ecological datasets and evaluate model accuracy in predicting regions that concentrate high levels of genetic differentiation, representing phylogeographic breaks in this ecosystem. We specifically ask: 1) can this machine learning approach be used to accurately predict both global (assembly-wide) and species-specific genetic differentiation? 2) does the inclusion of species-specific ecological traits improve model accuracy? 3) what is the relative importance of dispersal traits and demographic traits in aiding model prediction? The first question aims to evaluate whether a machine learning approach can summarize existing information and learn enough about the spatial correlates of genetic breaks to predict which areas may function as barriers to gene flow in a focal region or for a focal species. The second question addresses whether features of the abiotic environment alone are enough to explain these correlations and to what extent data on ecological traits can help predictive approaches. Finally, the third question helps address the aforementioned knowledge gap about the correlation of demographic traits and dispersal.