2.3 | Genetic analyses
We classified cpDNA haplotypes solely based on nucleotide substitutions,
because there were numerous gaps due to large indels in therp L32–trn L(UAG) region. We described the frequency of
haplotypes in each of the 17 populations.
To verify the quality of ncEST-SSR loci, we estimated the median
frequency of null alleles at each locus using function
popgenreport(mk.null.all) (Brookfield 1996) in package PopGenReport
(Adamack and Gruber 2014) in R 4.2.2 (R_Core_Team 2019). We conducted
chi-square tests for deviation of the Hardy-Weinberg equilibrium at each
locus in each population using function popgenreport(mk.hwe). We removed
loci with null alleles at substantially high (> 0.15)
frequency and highly significant (P < 0.001) deviation
in several (> 5) populations and used remaining loci in the
following genetic analysis.
To evaluate genetic variation in ncEST-SSR genotypes, we conducted
principal component analysis (PCA) of individual genotypes in the 24
populations and allele frequencies in individual populations using
function dudi.pca in R package adegenet (Jombart 2008). We obtained the
contributions of the first to fifth principal components (PCs) to total
variation. We plotted the coordinates of the first and second PC values
of each individual and each population to categorize them into genetic
groups.
To examine genetic diversity in ncEST-SSR genotypes in each population,
we calculated the allelic richness (AR [32],
the number of alleles in the smallest sample size, 32 alleles) using
function popgenreport(mk.allel.rich) in R package PopGenReport as well
as the unbiased expected heterozygosity (H E) and
the inbreeding coefficient (F IS) using functions
Hs and Ho in R package hierfstat (Goudet 2005) in R 4.2.2. Significantly
positive or negative F IS in each population was
verified by 2.5 and 97.5 percentiles estimated from bootstrapping over
loci using function boot.ppfis. Differences inF IS between the genetic groups of Qc andQch (see Results 3.2) were verified by the Kruskal-Wallis
rank-sum test using function kruskal.test in R 4.2.2. We examined
latitudinal and elevational clines as well as differences between the
genetic groups in genetic diversity of populations. We applied a
regression model: y ~ α +
β1x 1 +
β2x 2 + γ, where y isAR [32] or H E, α is an
intercept, β1 is a coefficient ofx 1: latitude (˚N), β2 is a
coefficient of x 2: elevation (m), and γ is a
coefficient of the genetic groups, to data in the 24 populations using
function lm in R 4.2.2.
To evaluate genetic structure in ncEST-SSR genotypes, we conducted
model-based Bayesian clustering of individual genotypes in the 24
populations using STRUCTURE 2.3.3 (Pritchard et al. 2000). The admixture
model of ancestries and correlated allele frequencies were used. Twenty
replications were performed for each number of clusters (K ) from
1 and 5, with 50000 sampling iterations after 10000 burn-in iterations.
We evaluated increment in the log probability of data as K values
increased and its variation among replications at each K value.
We plotted the ancestry proportions of individuals at K values
that were suitable for describing genetic structure.
To examine genetic distance between populations in ncEST-SSR genotypes,
we calculated the Weir and Cockerham’s pairwiseF ST using function pairwise.WCfst in R package
hierfstat and the Nei’s distance D A using
function dist.genpop in R package adegenet. We constructed a
neighbor-joining (NJ) tree of the 24 populations based on theD A matrix using function nj in R package ape
(Paradis and Schliep 2019). To assess the robustness of the NJ tree, we
generated multiple NJ trees based on D A matrices
from jackknifing of loci (removing one out of available loci) and
calculated the frequency of generated trees at focal nodes that
contained consistent populations with the NJ tree constructed from all
the available loci.