Phylogenetic and population genetic analyses
Phylogenetic analyses for both the COI gene and the filtered genomic SNPs were conducted in IQ-TREE 1.3.10 (Nguyen et al.2015). Model testing, SH-aLRT branch testing, and 1000 replicates of ultrafast bootstrapping (Hoang et al. 2018) were conducted in the program.
Mitochondrial COI gene data was used to build a minimum spanning haplotype network (Bandelt et al. 1999) in the program PopART (Leigh & Bryant 2015), which outputs a visual representation of the population genetic relationships between COI haplotypes. We used Structure version 2.3.4 (Pritchard et al. 2000) and TESS version 2.3.1 (Chen et al. 2007) to infer population structure in the SNP dataset. While both programs take a similar Bayesian approach to population clustering based on changes in allele frequency, TESS differs from Structure by additionally incorporating a spatial component for inferring genetically disparate populations that may result from geographic discontinuities. This is particularly useful when genetic structure correlates to isolation by distance, which can contribute to population over-splitting in non-spatial programs (Chen et al.2007). Campbell et al. (2019) showed strong genetic sub-structuring within S. hesperis that corresponded to sets of populations sampled southwest and northeast of the Rocky Mountains; TESS and Structure were compared to clarify the extent that geography influenced these results.
Structure analyses were run using the admixture model without using sampling locations as a prior. We tested K = 1-10 with a burn-in period of 150,000 generations, 750,000 MCMC chains, and 10 replicate runs for each K value. We also conducted separate substructure analyses for S. zerene, S. atlantis, and the northern cluster of S. hesperis , testing K = 1-5 for each. We used CLUMPAK v. 1.1 (Kopelman et al. 2015) to average runs and determine the optimal K considering both ΔK (Evannoet al. 2005) and LnPr(K ) (Pritchard et al. 2000). We used TESS to infer K = 2-7, and incorporated weights on the Voronoi network by computing pairwise Euclidean distances between the geographic coordinates for each specimen. These weights correct for regions with irregular or unequal sampling during Voronoi neighbourhood estimation. We ran this analysis using the CAR admixture model (Durandet al. 2009) for 10 replicates per K , with a burn-in period of 50,000 and 200,000 sweeps (analogous to “generations” in Structure), and sampled the spatial interaction parameter and variance during the MCMC runs. Following program recommendations, we averaged the Deviance Information Criterion (DIC) score for the 10 runs of each value of K , and then identified the optimal K as the lowest value of K at which the DIC scores stabilized.