2.3 | Genetic analyses
We classified cpDNA haplotypes solely based on nucleotide substitutions, because there were numerous gaps due to large indels in therp L32–trn L(UAG) region. We described the frequency of haplotypes in each of the 17 populations.
To verify the quality of ncEST-SSR loci, we estimated the median frequency of null alleles at each locus using function popgenreport(mk.null.all) (Brookfield 1996) in package PopGenReport (Adamack and Gruber 2014) in R 4.2.2 (R_Core_Team 2019). We conducted chi-square tests for deviation of the Hardy-Weinberg equilibrium at each locus in each population using function popgenreport(mk.hwe). We removed loci with null alleles at substantially high (> 0.15) frequency and highly significant (P < 0.001) deviation in several (> 5) populations and used remaining loci in the following genetic analysis.
To evaluate genetic variation in ncEST-SSR genotypes, we conducted principal component analysis (PCA) of individual genotypes in the 24 populations and allele frequencies in individual populations using function dudi.pca in R package adegenet (Jombart 2008). We obtained the contributions of the first to fifth principal components (PCs) to total variation. We plotted the coordinates of the first and second PC values of each individual and each population to categorize them into genetic groups.
To examine genetic diversity in ncEST-SSR genotypes in each population, we calculated the allelic richness (AR [32], the number of alleles in the smallest sample size, 32 alleles) using function popgenreport(mk.allel.rich) in R package PopGenReport as well as the unbiased expected heterozygosity (H E) and the inbreeding coefficient (F IS) using functions Hs and Ho in R package hierfstat (Goudet 2005) in R 4.2.2. Significantly positive or negative F IS in each population was verified by 2.5 and 97.5 percentiles estimated from bootstrapping over loci using function boot.ppfis. Differences inF IS between the genetic groups of Qc andQch (see Results 3.2) were verified by the Kruskal-Wallis rank-sum test using function kruskal.test in R 4.2.2. We examined latitudinal and elevational clines as well as differences between the genetic groups in genetic diversity of populations. We applied a regression model: y ~ α + β1x 1 + β2x 2 + γ, where y isAR [32] or H E, α is an intercept, β1 is a coefficient ofx 1: latitude (˚N), β2 is a coefficient of x 2: elevation (m), and γ is a coefficient of the genetic groups, to data in the 24 populations using function lm in R 4.2.2.
To evaluate genetic structure in ncEST-SSR genotypes, we conducted model-based Bayesian clustering of individual genotypes in the 24 populations using STRUCTURE 2.3.3 (Pritchard et al. 2000). The admixture model of ancestries and correlated allele frequencies were used. Twenty replications were performed for each number of clusters (K ) from 1 and 5, with 50000 sampling iterations after 10000 burn-in iterations. We evaluated increment in the log probability of data as K values increased and its variation among replications at each K value. We plotted the ancestry proportions of individuals at K values that were suitable for describing genetic structure.
To examine genetic distance between populations in ncEST-SSR genotypes, we calculated the Weir and Cockerham’s pairwiseF ST using function pairwise.WCfst in R package hierfstat and the Nei’s distance D A using function dist.genpop in R package adegenet. We constructed a neighbor-joining (NJ) tree of the 24 populations based on theD A matrix using function nj in R package ape (Paradis and Schliep 2019). To assess the robustness of the NJ tree, we generated multiple NJ trees based on D A matrices from jackknifing of loci (removing one out of available loci) and calculated the frequency of generated trees at focal nodes that contained consistent populations with the NJ tree constructed from all the available loci.