Genetic structure and species delineation analyses
We employed the program STRUCTURE version 2.3.4 (Pritchard et al. 2000) implemented in the ipyrad.analysis toolkit (https://ipyrad.readthedocs.io/analysis.html), to assess patterns of genetic structure and admixture among the examined populations using the 3RAD dataset. We used the 98_33 matrix (clustering threshold=98 and min_sample_locus = 33, outgroup excluded) to perform an individual-based Bayesian clustering analysis. We used an admixture model with correlated frequencies and assessed values of population differentiation (K) in 15 independent runs for each K from 2 to 8. All runs were conducted with the first 250,000 being discarded as burn-in. Figures were generated based on the iterations with the highest posterior probability. The optimal K value was determined based on the highest average likelihood value [LnP(D)] obtained (Evanno et al. 2005).
We carried out species delimitation analyses with the 3RAD data using the program Bayesian Phylogenetics & Phylogeography version 3.3 (BPP; Yang and Rannala 2010; Yang and Rannala 2014; Flouri et al. 2018). BPP evaluates speciation models using a reversal jump Markov Chain Monte Carlo (rjMCMC) algorithm to determine whether to collapse or retain nodes in the phylogeny, assuming no admixture following a speciation event (Yang and Rannala 2010). BPP requires an input guide tree representing the species phylogeny with all possible species (Leaché and Fujita 2010). We used the topology derived from the analysis with the 98_33 matrix as guide tree, employing two different species assignment criteria: 1) a morphospecies assignment following Aguilar-Velasco et al. (2016), and 2) based on geographically congruent clades containing at least two samples.
We randomly subsampled the 98_33 matrix to produce three different matrices with 200 loci and 10 with 50 loci. We examined three sets of parameters varying the ancestral population size (θ) and root age (τ). The first assumed large ancestral population sizes and deep divergences [θ G (1, 10) and τ0G(1, 10)], the second small ancestral population sizes and shallow divergences among species [θ G(2, 2000) and τ0G(2, 2000) 3], and the third large ancestral populations sizes and relatively shallow divergences among species [θ G(1, 10) τ0 G(2, 2000)]. We ran analyses for all models without data to separately evaluate the effects of the data and parameters. We subsequently ran seven replicates for one of the matrices with 50 loci with each set of parameters to check whether the analyses were run long enough, and then ran analyses for all matrices with five replicates each starting from different seeds. A significant posterior probability (PP ≥ 0.95) value was employed across all runs to retain a given node (i.e., indicating lineage splitting). All analyses were run for 500,000 generations (first 10,000 were burn-in), with a sampling interval of 50.
We conducted a Bayes Factor species delimitation (BFD) analysis (Grummer et al. 2014; Leaché et al. 2014) for the phased UCE 100% taxon occupancy matrix calling a single variant SNP per locus. The BFD approach compares candidate species delimitation models with different numbers of species, estimating the marginal likelihood of each competing species delimitation model, ranking models by marginal likelihood and estimating Bayes factors to assess support for model rankings (Kass and Raftery 1995).
We also conducted a BDF analysis following Leaché and Bouckaert’s (2018) tutorial. This tutorial implements SNAPP (Bryant et al. 2012), available in the BEAST2 platform (Bouckaert et al. 2014). SNAPP bypass the necessity of having to explicitly integrate or sample gene trees at each locus, and it codes SNP data as follows: individual homozygous for the original state= “0,” heterozygous= “1,” homozygous for derived state = “2”. We used a python function to extract biallelic SNPs directly from allele Multiple Sequence Alignments (MSAs) (snps_from_uce_alignments.py, available from: github.com/tobiashofmann88/snp_extraction_from_alignments/; Andermann et al. 2019). We compared species delimitation models differing in the number of species. The base model had five species (E. ruidumspp. 1-4, 2x3), whereas the alternative ones were collapsed into one, two, three and four species in different combinations. The path sampling parameters, which are used to estimate marginal likelihood, were set to 12 and 24 steps with chainLength = 1,000,000.
All the analyses conducted using the 3RAD and UCE data sets were run on the CIPRES Web Portal (Miller et al. 2010) or on the the Miztli supercomputer owned by the Dirección General de Cómputo y de Tecnologías de Información y Comunicación, Universidad Nacional Autónoma de México (DGTIC, UNAM).