1. Introduction
Documenting and predicting spatial patterns of biodiversity at different
scales remains the major goal of biogeography and a crucial step in
proposing appropriate conservation policies
(Cadotte and Tucker
2018, Burbano-Girón et al. 2022). Efforts to map and conserve
intraspecific genetic diversity, in particular, are relevant whenever
one’s goal is to preserve evolutionary history
(Tucker et al. 2019),
maintain population connectivity
(Schoville et al.
2018, Bracco et al. 2019) and ensure adaptation potential in the face
of future environmental changes
(Hoelzel et al. 2019).
Identifying regions of lineage turnover within the spatial range of
species is therefore important when delimiting conservation areas that
account for cryptic genetic diversity
(Crandall et al. 2000,
D’Amen et al. 2013). However, performing such task for multiple species
at a time requires extensive field work, time and resources - especially
in megadiverse communities such as those in highly threatened tropical
systems. Approaches that predict the distribution of genetic diversity,
without the need of additional intensive fieldwork, are therefore
desirable and may contribute novel insights to inform conservation
(Manel and
Holderegger 2013, Pollock et al. 2020, Green et al. 2022). For
instance, predictive models of intraspecific genetic differentiation can
be useful in conservation biology by summarizing genetic patterns within
multiple co-distributed species in a region and by providing a map of
genetic barriers within a community. Moreover, if these models are both
reliable and transferable across a group of species, one may be able to
use them to predict the location of genetic breaks in species for which
environmental and ecological information are available, but genetic data
are scarce or inexistent. This would be the case of models built from
community-level environmental, ecological, and genetic data, yet devoted
to species-specific predictions for target, endangered, or
data-deficient taxa for which molecular data are not widely available.
With phylogeographic surveys increasing in numbers across the world
(Hickerson et al. 2010), and
community-level datasets becoming more common, these exercises are now
possible.
Contributing toward this goal, evolutionary biologists have been
generating models that predict levels of genetic diversity and
connectivity between populations from environmental information,
especially topographic and climatic gradients across space and time
(van Strien et
al. 2014, Brown et al. 2016, Espíndola et al. 2016). This is based on
widespread observations that both climate and geography are highly
correlated with the spatial distribution of genetic diversity
(Carstens and
Richards 2007, Carnaval et al. 2014, Cabanne et al. 2016). Not
surprisingly, it has been shown that levels of genetic differentiation
within a species are also impacted by ecological traits, especially
those characteristics thought to correlate with dispersal capacity, such
as morphological attributes (e.g., body size;
Pabijan et al. 2012),
reproductive strategies (Paz
et al. 2015), habitat occupancy
(Burney and Brumfield 2009)
and foraging ecology (Miller
et al. 2021). These elements of the ecology of species may be
especially important for conservation genetics by helping us understand
whether, how and why the patterns of genetic differentiation within one
specific taxon may differ from the more common (or general) pattern
detected in the overall community
(Fortuna et al. 2009,
Porto et al. 2013).
Less explored, however, are the roles of demographic traits such as
fecundity, mortality and generation length, especially at large spatial
scales. Such traits have been hypothesized to indirectly affect
dispersal capacity through their effect on the number of individuals in
a population, the number of offspring per generation, the local
extinction rate and the frequency of reproductive cycles, all of which
can influence the probability of dispersal events
(Perry et
al. 2005, Stevens et al. 2013, Castorani et al. 2017, Bonte and Dahirel
2017, Weil et al. 2022). For instance, dispersal distances were shown
to be correlated with fast life-history strategies (i.e., high fecundity
and low survival rates) in plants
(Beckman et al. 2018).
Additionally, it has been hypothesized that species with shorter
generation length have more dispersal opportunities per time unit, which
has been supported in butterflies
(Stevens et al. 2012).
However, tests of these relationships using genetic data are still
sparse, mostly because demographic traits are costly to estimate for
many taxa. Additionally, the question remains regarding to what extent
these demographic traits are correlated with morphological and foraging
ecology traits (e.g., body size), and therefore whether they would be
informative in predictive models of genetic differentiation.
Here, we create a machine learning model that draws from landscape
genetics and predictive phylogeography
(Espíndola et al.
2016, Pelletier and Carstens 2018, Sullivan et al. 2019) to predict the
magnitude of genetic differentiation among populations, using
environmental descriptors and both dispersal-related and demographic
traits within multiple co-distributed bird taxa in the Atlantic forest
of Brazil. Machine learning techniques have been showing great promise
in population genetics as a tool to leverage available data to
understand and predict geographic patterns
(Schrider and Kern 2018).
We used such approach to combine published mtDNA data with available
ecological datasets and evaluate model accuracy in predicting regions
that concentrate high levels of genetic differentiation, representing
phylogeographic breaks in this ecosystem. We specifically ask: 1) can
this machine learning approach be used to accurately predict both global
(assembly-wide) and species-specific genetic differentiation? 2) does
the inclusion of species-specific ecological traits improve model
accuracy? 3) what is the relative importance of dispersal traits and
demographic traits in aiding model prediction? The first question aims
to evaluate whether a machine learning approach can summarize existing
information and learn enough about the spatial correlates of genetic
breaks to predict which areas may function as barriers to gene flow in a
focal region or for a focal species. The second question addresses
whether features of the abiotic environment alone are enough to explain
these correlations and to what extent data on ecological traits can help
predictive approaches. Finally, the third question helps address the
aforementioned knowledge gap about the correlation of demographic traits
and dispersal.