Genetic divergence and diversity estimation
To estimate absolute genetic divergences between populations of the
three subspecies, we computed pairwise DXYfollowing the formula derived by Nei (Nei & Li, 1979). When calculatingDXY , two alleles at each SNP were interpreted as
two haplotypes, and corresponding allele frequencies as haplotype
frequencies. Each pairwise DXY values were summed
over all SNPs and the sum was normalized by sequence length. To test
subspecies distinctness, the obtained DXY matrix
was used in multidimensional scaling using the ‘cmdscale’ package
implemented in R (Figure 2), as well as neighbor-joining tree
constructing using MEGA7 (Kumar, Stecher, & Tamura, 2016). We also
performed Principal Component Analysis (PCA) on the SNP frequency matrix
(summarizing the frequency of each SNP in each population) using the
“prcomp” function in R (Venables & Ripley, 2002) to test whether the
populations of three subspecies contain distinct polymorphisms. Finally,
to assess the extent to which genetic polymorphisms were fixed,FST statistics between subspecies and between
populations were computed following a method for a large number of SNPs
(Nei & Miller, 1990; Willing, Dreyer, & van Oosterhout, 2012).
To test whether the three subspecies maintain different levels of
genetic diversity, we computed statistics of π and Watterson’s θ.
The
π summarizes the average number of nucleotide differences between two
sequences randomly sampled from a population (Nei, 1987), while
Watterson’s θ estimates nucleotide polymorphism based on the number of
observed segregating sites (Watterson, 1977). To correct the systematic
errors of high-throughput sequencing, we computed the θ values following
a published algorithm (He et al., 2013). These parameters, i.e.DXY , FST , θ and π, were
calculated using in-house Perl scripts (available from the
aforementioned GitHub repository).