Genetic structure and species delineation analyses
We employed the program STRUCTURE version 2.3.4 (Pritchard et al. 2000)
implemented in the ipyrad.analysis toolkit
(https://ipyrad.readthedocs.io/analysis.html),
to assess patterns of genetic structure and admixture among the examined
populations using the 3RAD dataset. We used the 98_33 matrix
(clustering threshold=98 and min_sample_locus = 33, outgroup excluded)
to perform an individual-based Bayesian clustering analysis. We used an
admixture model with correlated frequencies and assessed values of
population differentiation (K) in 15 independent runs for each K from 2
to 8. All runs were conducted with the first 250,000 being discarded as
burn-in. Figures were generated based on the iterations with the highest
posterior probability. The optimal K value was determined based on the
highest average likelihood value [LnP(D)] obtained (Evanno et al.
2005).
We carried out species delimitation analyses with the 3RAD data using
the program Bayesian Phylogenetics & Phylogeography version 3.3 (BPP;
Yang and Rannala 2010; Yang and Rannala 2014; Flouri et al. 2018). BPP
evaluates speciation models using a reversal jump Markov Chain Monte
Carlo (rjMCMC) algorithm to determine whether to collapse or retain
nodes in the phylogeny, assuming no admixture following a speciation
event (Yang and Rannala 2010). BPP requires an input guide tree
representing the species phylogeny with all possible species (Leaché and
Fujita 2010). We used the topology derived from the analysis with the
98_33 matrix as guide tree, employing two different species assignment
criteria: 1) a morphospecies assignment following Aguilar-Velasco et al.
(2016), and 2) based on geographically congruent clades containing at
least two samples.
We randomly subsampled the
98_33 matrix to produce three different matrices with 200 loci and 10
with 50 loci. We examined three sets of parameters varying the ancestral
population size (θ) and root age (τ). The first assumed large ancestral
population sizes and deep divergences [θ G (1, 10) and τ0G(1, 10)],
the second small ancestral population sizes and shallow divergences
among species [θ G(2, 2000) and τ0G(2, 2000) 3], and the third large
ancestral populations sizes and relatively shallow divergences among
species [θ G(1, 10) τ0 G(2, 2000)]. We ran analyses for all models
without data to separately evaluate the effects of the data and
parameters. We subsequently ran seven replicates for one of the matrices
with 50 loci with each set of parameters to check whether the analyses
were run long enough, and then ran analyses for all matrices with five
replicates each starting from different seeds. A significant posterior
probability (PP ≥ 0.95) value was employed across all runs to retain a
given node (i.e., indicating lineage splitting). All analyses were run
for 500,000 generations (first 10,000 were burn-in), with a sampling
interval of 50.
We conducted a Bayes Factor species delimitation (BFD) analysis (Grummer
et al. 2014; Leaché et al. 2014) for the phased UCE 100% taxon
occupancy matrix calling a single variant SNP per locus. The BFD
approach compares candidate species delimitation models with different
numbers of species, estimating the marginal likelihood of each competing
species delimitation model, ranking models by marginal likelihood and
estimating Bayes factors to assess support for model rankings (Kass and
Raftery 1995).
We also conducted a BDF analysis following Leaché and Bouckaert’s (2018)
tutorial. This tutorial implements SNAPP (Bryant et al. 2012), available
in the BEAST2 platform (Bouckaert et al. 2014). SNAPP bypass the
necessity of having to explicitly integrate or sample gene trees at each
locus, and it codes SNP data as follows: individual homozygous for the
original state= “0,” heterozygous= “1,” homozygous for derived state
= “2”. We used a python function to extract biallelic SNPs directly
from allele Multiple Sequence Alignments (MSAs)
(snps_from_uce_alignments.py, available from:
github.com/tobiashofmann88/snp_extraction_from_alignments/; Andermann
et al. 2019). We compared species delimitation models differing in the
number of species. The base model had five species (E. ruidumspp. 1-4, 2x3), whereas the alternative ones were collapsed into one,
two, three and four species in different combinations. The path sampling
parameters, which are used to estimate marginal likelihood, were set to
12 and 24 steps with chainLength = 1,000,000.
All the analyses conducted using the 3RAD and UCE data sets were run on
the CIPRES Web Portal (Miller et al. 2010) or on the the Miztli
supercomputer owned by the Dirección General de Cómputo y de Tecnologías
de Información y Comunicación, Universidad Nacional Autónoma de México
(DGTIC, UNAM).