Fig. 1. Sampled sites of Opisopappus taihangensis andOpisopappus longilobus .
The sampled sites of OLa, OLb, OTc and OTd were distinguished by pink,
blue, yellow, and red respectively and visualized on the map.
2.2 SNPs screened and
acquisition
Genomic DNA from 160 individuals leaves of Opisthopappus was
extracted by the modified CTAB method (Papadopulos et al., 2014). The
quality of DNA was evaluated by 0.8% agarose gel electrophoresis and an
ultraviolet spectrophotometer.
In order to construct genome sequencing library to detect single
nucleotide polymorphism (SNPs), the extracted genome DNA was fully
enzymatically cut by using EcoRI+BfaI restrictiion endoenzyme.
Too large and too small fragments after enzyme digestion were removed
using VAHTSTM DNA Clean Beads, and
then were ligased with the P1
connector and P2 connector. PCR reactions were performed to enrich the
sequencing library template. PCR products were purified by VAHTSTM DNA
Clean Beads and then sequenced on Illumina HiSeq sequencer using 2×150bp
double-terminal sequencing by Shanghai Personalbio Technology CO. ,
LTD.. The quality of data was controlled using FastQC
(http://www.bioinformatics.babraham.ac.uk/projects/fastqc) (Schmidt et
al., 2021). Reads with Q20 ≥ 90% and Q30 ≥ 85% were screened from raw
data to obtain high quality data. Trimmed reads were mapped to theChrysanthemum genome version 2.0
(http://www.amwayabrc.com/index.html) using BWA and SAMtools
program“rmdup” to remove duplications. Only reads with at least 85 %
match to the reference genome were retained(Contaldi et al., 2021).
After sequencing, SNPs were detected by GATK v3.8 software (do Valle et
al., 2016; Peter et al., 2018). All SNPs were screened from 160
individuals and saved as VCF files. Subsequently, the SNPs with deletion
rates over 5% were removed by Plink 1.9 with Geno and MAF commands, and
that with 1% minor interference frequency was selected. Finally, the
SNPs with P<0.000001 were screened out with the hardy -
Weinberg balance test performed by hardy command (Purcell et al., 2007).
2.3 Genetic characteristics
estimated
The observed heterozygosity (H O) and
average heterozygosity
(H S) were calculated using STACKS of different
populations, groups, and species based on the screened SNP data
(Rochette et al., 2019). F IS value (inbreeding
coefficient) anomg populations, groups, and species was evaluated using
the “boot. ppfis” function of R package hierfstat with 999 bootstrap
values (Goudet et al., 2015). The mean allelic richness (i.e., mean
number of alleles per locus, Ar) was calculated with the function
“allel.rich” of R package PopGenReport (Adamack and Gruber, 2014). The
proportion of shared alleles between populations, groups and species was
estimated with the “pairwise.propShared” function of the R package
PopGenReport (Song et al., 2017). The IDW model of GIS was used to
visualized the results on the map (Yu et al., 2014).
The genetic variations (F ST) among populations,
among groups, and between species were estimated by Arlequin v3.5 with
default parameter (Excoffier and Lischer, 2010).F ST values of 0- 0.05, 0.05-0.15 and 0.15-0.25
indicated a little genetic differentiation, moderate genetic
differentiation, and extensive genetic differentiation among
populations, respectively; F ST >
0.25 indicated a substantial genetic differentiation among populations.
2.4 Gene exchange/flows and
migration
Bayesian method implemented in the STRUCTURE v.2.3.4 was used to
estimate the likely number of genetic clusters (K ) forOpisthopappus species (Papadopulos et al., 2014; Pritchard et
al., 2000). A range of K values was planned from K = 1 to
10. Each number of K clusters was assessed in 5 independent runs,
with a burn-in of 100,000 MCMC steps, followed by an additional 100,000
MCMC steps (Kunde et al., 2020; Morris‐Pocock et al., 2016). Then the
resulted files were uploaded to the ”STRUCTURE HARVESTER” website
(http://taylorO.biology.ucla.edu/struct_harvest/) (Earl and VonHoldt,
2012). And the structure cluster diagram was drawn by the ”clumpak”
website
(http://clumpak.tau.ac.il/)
(Furlan et al., 2013; Kopelman et al., 2015).
To analyze the gene exchange among populations, groups and species, the
effective migration rate was used in this study. The effective migration
rate is the migration rate (M) and was calculated using BayesAss 3.0
software (Wilson and Rannala, 2003). When running, the parameters were
set at 0.50 allele frequency, 0.50 inbred line number, 0.50 migration
rate, 100 seed, program 10000000 times, 1000000 number of iterations,
1000 sampling interval to ensure the reliability and accuracy of M value
(Morris‐Pocock et al., 2016).
EEMS (Estimate effective Migration Surface) uses the concept of
effective migration to model the relationship between genetics and
geography, and it can output an ”estimated effective migration surface”
(Petkova et al., 2016), namely, a visual representation of population
structure that can highlight potential regions of higher-than-average
and lower-than-average gene flow. Thus, the migration patterns ofO. taihangensis and O. longilobus , from the group,
population to individual, in the Taihang Mountains were analyzed through
the effective migration surface by EEMS software (Mahtani-Williams et
al., 2020).
Firstly, VCFtools software converted SNPs data into bed, fam, and bim
files (Danecek et al., 2011). The ”BED2DIFFs” program provided by the
EEMS model on the Ubuntu platform converted the genetic differentiation
matrix file from SNPs into diffs. Meanwhile, each population’s latitude
and longitude coordinates were arranged in the same order as SNPs data
and made a coord file.
In addition, the habitat polygon per dataset (coord file) was obtained
using Google Maps API v3 Tool
(http://www.birdtheme.org/useful/v3tool.html)
and plotted using the R package rEEMSplots (Petkova et al., 2016). A
distribution map of genetic characteristics of Opisthopappusspecies was also outputted by EEMS based on the above data
(Mahtani-Williams et al., 2020).
2.5 Landscape factors
extraction
Landscape factors required for data mainly came from the FAO
(http://www.fao.org/home/zh/) and Worldclim (https://worldclim.org/)
website (Mahtani-Williams et al., 2020).
For the Taihang Mountains, the climate data was (1970-2000) mainly from
WorldClim 2.1 (www.worldclim.org) and land data was from the United
Nations Food and Agriculture Organization (www.fao.org). Total 107
landscape factors were obtained, such as monthly mean temperature (℃),
maximum temperature (℃), rainfall (mm), wind speed (m/s), vapor pressure
(kpa), solar radiation (kJ /m2*d), and land development and utilization
(Support files 1).
For each sampled location, the climatic landscape factors were extracted
from the downloaded raster dataset and ASCII dataset using the ”rgdal”
and ”raster” packages in the R software (Bivand et al., 2015; Hijmans et
al., 2015) (Support files 1).
Because the multicollinearity of factors can lead to the faulty of
modeling, the highly correlated environmental predictors were removed.
And the variance inflation factor (VIF) was calculated by R package
”usdm” for each landscape factor to detect the current linearity, and
then the factors with VIF < 5 were kept (Mahtani-Williams et
al., 2020; Papadopulos et al., 2014; Ruan et al., 2021) (Support files
2).
The principal component analysis was performed with Origin 2022 software
(https://www.originlab.com/) to determine important landscape factors
that contributed to the O. longilobus and O. taihangensis .
Subsequently, the screened factors were also mapped by Origin 2022
software (Fig. 2).