Fig. 1 Sampling locations
These sequence data have been submitted to the European Nucleotide
Archive (ENA) under accession number PRJEB65323. (not yet published)
2.3 DNA extraction
The genomic DNA was extracted from 40 to 80 mg of dried needles after
grinding them for 2 minutes in a FastPrep®-24 instrument (MP
BIOMEDICALS) device using the Invisorb® Spin Plant Mini Kit (STRATEC
MOLECULAR), DNeasy® Plant Mini Kit (QIAGEN) or DNeasy® Plant Pro Kit
(QIAGEN), following the supplier’s protocol, but the buffers were heated
before use.
2.4 GBS, de novo assembly, and variant call
Genotyping by sequencing (GBS) was conducted following Wendler et al.
(2014) to obtain genome-wide single nucleotide polymorphisms (SNPs).
Genomic DNA (200 ng) was digested using restriction enzymes PstI-HF and
MspI. PstIHF is a rare-cutting enzyme (recognition site: CTGCA’G) and
the methylation-sensitive enzyme MspI (recognition site: C’CGG) is a
frequent cutting enzyme. Libraries were sequenced on an Illumina HiSeq
2000 and NovaSeq6000, generating single-end reads of 100-120 bp.
Barcoded Illumina reads were de-multiplexed using the Casava pipeline
1.8 (Illumina), trimmed, filtered, and de novo assembled with ipyrad v.
0.7.30 (Eaton and Overcast 2020).
A maximum of 5 low quality (Q<20) bases were allowed in a
read. Consensus base calling was also part of the ipyrad pipeline with a
minimum depth set at 6. A threshold of at least 85% sequence similarity
was set to identify homologous sequences, and thus cluster together. The
minimum number of samples that must have data at a given locus was set
to 60%. Heterozygous sites were allowed for a maximum of 50% of the
samples.
For additional details, the ipyrad parameter file is provided in
Appendix S2. The geno file was used to generate the input file for
DIYABC (Appendix S3).
2.5 GBS data analysis
The genetic structure among populations and between individuals has the
potential to unravel spatial distribution patterns. Individuals will
have more similar genotypes when they originate from the same
population. Thus, it is possible to evaluate how they cluster based on
their genotypes (Cornuet et al. 1999). Here we use the Bayesian
population assignment using the R package LEA to serve as an
illustration of the distribution of ancestry coefficients. It represents
the spatial predictions on a geographic map. The acquired SNP data are
linked to geographical information and patterns are inferred through the
analysis of ancestry coefficients and admixture rates (cluster analysis)
(Frichot and François 2015). Subsequently, the evaluation of potential
biogeographical dynamics of different larch populations in Russia will
be performed via the model-based approach Approximate Bayesian
Computation, implemented in the software DIYABC (Cornuet et al. 2014).
2.5.1 Spatial distribution of SNPs
We used optimized versions of principal component analysis (PCA) and
non-negative matrix factorization algorithms (sNMF) (Frichot et al.
2014) as implemented in the R package LEA version 2.8.0 (Frichot and
Francois 2015; Francois 2016). To define the optimal number of genetic
clusters (K) we tested K values from 1 to 6 based on knowledge about the
number of larch species and main hybrids in Siberia. The final number of
clusters (K) was selected by choosing the K-value with the lowest
cross-entropy. Only the K with the maximal local contribution to
ancestry is represented at each point of the geographic map.
2.5.2 Biogeographic inference - Approximate Bayesian Computation (ABC)
Linkage disequilibrium (LD), Hardy Weinberg equilibrium (HWE), minor
allele frequency (MAF), and data missingness (MISS) of the SNP dataset
were inferred using the software PLINK (Purcell et al. 2007). SNPs were
LD pruned with a window of 50 SNPs, a step size of 20 makers, and
r2 threshold 0.05. The SNPs showing severe distortion
of the HWE (p < 0.05), or MAF lower than 5%, as well as SNP
markers with missing data (MISS) above 20% were discarded from further
analysis.
We applied an Approximate Bayesian Computation (ABC) using the software
DIYABC v. 2.1.0 (Cornuet et al. 2014). The potential demographic history
was inferred via complex evolutionary scenarios. To keep the scenarios
in the ABC simple, four populations were defined based on the results of
the cluster analysis and the admixture plots: Pop1 (Western Siberia=
WSib), Pop2 (Western Yakutia= WYak), Pop3 (Eastern Yakutia= EYak), and
Pop4 (Chukotka= Chuk). Alternative scenarios including the predefined
four distinct genetic groups for the estimation of the population
demographic history were constructed. In these scenarios, t# represents
the time scale, measured by generation time, and N# represents the
effective population size of the corresponding populations. The 12
examined scenarios viewed backward in time, are as follows (Fig. 2):
Scenario 1: Pop4/ Chuk merged with Pop3/ EYak at t1, and then they
merged with Pop2/ WYak at t2. They merged with Pop1/ WSib at t3.
Scenario 2: Pop3/ EYak merged with Pop2/ WYak at t1, and then Pop4/ Chuk
merged with them at t2. They merged with Pop1/ WSib at t3.
Scenario 3: Pop3/ EYak was created by an admixture of Pop2/ Wyak and
Pop4/ Chuk at t1, and then Pop4/ Chuk merged with Pop2/ WYak at t2. Pop2
merged with Pop1/ WSib at t3.
Scenario 4: Three populations (Pop1/ WSib, Pop3/ EYak, Pop4/ Chuk) split
at the same time, namely t1, and then Pop2/ WYak merged with them at
t2.
Scenario 5: All four populations (Pop1/ WSib, Pop2/ WYak, Pop3/ EYak,
Pop4/ Chuk) split at the same time, namely t1.
Scenario 6: The order of the hierarchical splits is the same as in
scenario 1 but a recent bottleneck was modeled in all four populations
at t1-db (db: time scale to bottleneck before t1): population size
changed from N1 to N1b, from N2 to N2b, from N3 to N3b, and from N4 to
N4b. N4b merged with N3b at t2, and then they merged with N2b at t3.
They merged with N1b at t4.
Scenarios 7–12: These are the same as scenarios 1–6 but an ancient and
severe bottleneck was modeled: population size changed from N1 to N1a
(and from N1b to N1a for scenario 12) at t4-db for scenarios 7–9, at
t3-db for scenario 10, at t2-db for scenario 11, and at t5-db for
scenario 12.