Genetic divergence and diversity estimation
To estimate absolute genetic divergences between populations of the three subspecies, we computed pairwise DXYfollowing the formula derived by Nei (Nei & Li, 1979). When calculatingDXY , two alleles at each SNP were interpreted as two haplotypes, and corresponding allele frequencies as haplotype frequencies. Each pairwise DXY values were summed over all SNPs and the sum was normalized by sequence length. To test subspecies distinctness, the obtained DXY matrix was used in multidimensional scaling using the ‘cmdscale’ package implemented in R (Figure 2), as well as neighbor-joining tree constructing using MEGA7 (Kumar, Stecher, & Tamura, 2016). We also performed Principal Component Analysis (PCA) on the SNP frequency matrix (summarizing the frequency of each SNP in each population) using the “prcomp” function in R (Venables & Ripley, 2002) to test whether the populations of three subspecies contain distinct polymorphisms. Finally, to assess the extent to which genetic polymorphisms were fixed,FST statistics between subspecies and between populations were computed following a method for a large number of SNPs (Nei & Miller, 1990; Willing, Dreyer, & van Oosterhout, 2012).
To test whether the three subspecies maintain different levels of genetic diversity, we computed statistics of π and Watterson’s θ. The π summarizes the average number of nucleotide differences between two sequences randomly sampled from a population (Nei, 1987), while Watterson’s θ estimates nucleotide polymorphism based on the number of observed segregating sites (Watterson, 1977). To correct the systematic errors of high-throughput sequencing, we computed the θ values following a published algorithm (He et al., 2013). These parameters, i.e.DXY , FST , θ and π, were calculated using in-house Perl scripts (available from the aforementioned GitHub repository).