Phylogeny, occurrence records, and niche models.
We obtained a dated phylogeny for all seed plants from Smith and Brown
(2018; ALLMB phylogeny) and left polytomies unresolved. This phylogeny
generated a species list with which to query American specimen records
from the Global Biodiversity Information Facility (GBIF) and Integrated
Digitized Biocollections (iDigBio). Records were then cleaned and
filtered using the BiotaPhy Platform interface
(https://biotaphy.github.io), following their accepted best practices.
The full GBIF dataset (Nrecords=36,335,199) is described
and accessible at (https://doi.org/10.15468/dl.gtgtt5). Briefly, GBIF
records with the following flags were removed: TAXON-MATCH_FUZZY,
TAXON_MATCH_HIGHER_RANK, TAXON_MATCH_NONE. Further processing was
performed after aggregating GBIF and iDigBio records. For iDigBio, data
cleaning and filtering produced a dataset of 13,667,523 records
(Ninitial=58,384,427; 23.4% retained). Briefly, initial
records were filtered by removing those with any of the following flags:
GEOPOINT_DATUM_MISSING, GEOPOINT_BOUNDS, GEOPOINT_DATUM_ERROR,
GEOPOINT_SIMILAR_COORD, REV_GEOCODE_MISMATCH, REV_GEOCODE_FAILURE,
GEOPOINT_0_COORD, TAXON_MATCH_FAILED, DWC_KINGDOM_SUSPECT,
DWC_TAXONRANK_INVALID, DWC_TAXONRANK_REMOVED. Full details are
provided in the Dryad deposit associated with this study
(https://doi.org/10.5061/dryad.9cnp5hqgx).
Aggregated GBIF and iDigBio records were then further processed by
excluding points with any of the following issues: (1) falling outside
the study area (the Americas); (2) less than four decimal point
precision (~11 m near the equator); (3) duplicate
localities (rarefaction); (4) falling outside polygons describing
accepted species’ distributions (defined by Plants of the World Online,
POWO; Brummitt 2001; www.github.com/tdwg/wgsrpd); (5) species with fewer
than twelve records (to build reliable niche models).
Cleaned records were then passed to MaxEnt (version 3.1.4; Phillips,
Anderson, Schapire, 2006) along with 2.5’ resolution climate data from
WorldClim (Fick and Hijmans, 2017) to build species distribution models
(SDMs). We chose to perform our analyses using SDMs rather than point
occurrence records for two reasons. SDMs offer a probabilistic way of
describing expected species’ ranges based on the climate from sites
where the species has been observed. In this way, SDMs convert presence/
absence data into a continuously valued function, allowing us to ask how
distributions are impacted by abiotic factors without having to
arbitrarily bin species, as for example, alpine or montane. Second,
using SDMs helps overcome some sampling limitations by providing insight
into the climatic tolerances of where species might occur, even if they
have not been sampled at that precise location (Barthlott et al. 2007;
Meyer, Kreft, Guralnik, Jetz, 2015; Brummitt, Araújo, Harris, 2021).
Although this could lead to erroneously predicting, for example, that a
northern boreal species should occur at extreme southern latitudes, we
overcome this obstacle by masking the SDMs with polygons provided by
POWO that define geographically broad areas where each species occurs
based on expert assessments. This approach thus constrained SDMs by both
known areas of occurrence and climatic tolerances.