TAXONOMY CLASSIFICATION
Botany, Genomics, Phylogenetics, Taxonomy
1 | Introduction
Plastids serve as crucial
organelles that carry out photosynthesis in green plants. Generally,
autotrophic plants are highly conserved in the chloroplast genomes
(Wicke, Schneeweiss, dePamphilis, Müller, & Quandt, 2011). By contrast,
heterotrophic plants including mycoheterotrophs and parasites exhibit
varying degrees of degeneration in their chloroplast genomes (Graham,
Lam, & Merckx, 2017). The family Orobanchaceae is recognized as the
largest single family of parasites, consisting of 101 genera and over
2100 species with autotrophic, hemiparasitic, and holoparasitic
lifestyles (Byng et al., 2016; Nickrent, 2020). Orobanchaceae is thus
considered as an excellent model to elaborate the parasitic plants
evolution (Westwood et al., 2012). Most previous studies have focused on
autotrophs and holoparasites (Xiaoqing Liu et al., 2019; Wicke &
Naumann, 2018; Zeng et al., 2017). However, limited investigations have
been done on hemiparasites (Xin Li et al., 2021; R. Zhang et al., 2020).
The hemiparasitic tribe Cymbarieae is widely distributed in Eurasia and
comprises approximately 20 species in five genera (Schneeweiss, 2013).
Cymbarieae was originally thought to be sister to various parasitic
lineages in the family Orobanchaceae (Bennett & Mathews, 2006; McNeal,
Bennett, Wolfe, & Mathews, 2013), while analysis of the tribe
Orobancheae suggests that this original classification merits
reconsideration (Xi Li, Feng, Randle, & Schneeweiss, 2019).
Furthermore, although chloroplast genomes of Schwalbea americana(Wicke et al., 2013) and Siphonostegia chinensis (Gao et al.,
2019; Jiang et al., 2022) have been previously published, no systematic
comparative analysis of chloroplast genomes of various groups within the
tribe Cymbarieae has been conducted to date.
Cymbaria L. sensu stricto (i.e., excludingCymbochasma Endl.) is the type genus of the tribe Cymbarieaea and
usually parasitizes on roots of Stipa (Poaceae) andCaragana (Fabaceae). It is now generally believed that this genus
includes only two facultative hemiparasites (Zhao, 1999). C.
mongolica Maxim. is endemic to the Loess Plateau of China, whileC. daurica L. is a characteristic species of Mongolian Plateau
steppe. According to the Chinese Pharmacopoeia and the Mongolian
Medicinal Materials Standard, C. daurica is the only original
plant species for elaborating the
traditional Mongolian medicine, which is referred to as “Kanba-Arong”
in Mongolian and “Xinba” in Chinese. It is widely used for treating
several diseases, including pruritus, psoriasis, fetotoxicity, impetigo,
and diabetes (Zhang et al., 2013). Its historical application dates back
to the 19th-century Mongolian pharmaceutical classic known asMengyaozhengdian (Fig. S1), which provides a systematic approach
to its usage. The dried whole plant is included in several classical
herbal formulations, such as “Siweixinbasan” and “Baweixinbasan.”
Previous studies have shown that C. daurica contains 177 chemical
components (Wu et al., 2020), and the most common are flavonoids (Z.-H.
Li et al., 2014) and iridoid glycosides (Q. Wang, Bao, Hao, & Han,
2018). This herb has received much recent attention owing to its diverse
bioactivities, including anti-diabetic (Gong et al., 2020),
anti-inflammatory (J.-J. Guo, Liu, Zhu, Ren, & Liang, 2017), and
anti-bacterial (Shi et al., 2020) activities. However, either accidental
or intentional, the adulteration and substitution of C. dauricafrequently occurs in China and Mongolia, due to its minor morphological
differences with C. mongolica (Hu, 2018; Liang, Liang, Yao, Liu,
& Li, 2016). Specifically, C. daurica has densely white
sericeous anther locules that are 4–4.5 mm and apically pilose, whereasC. mongolica has pilose anther locules that are 3–3.6 mm and
glabrous apically or occasionally with few hairs; thus, distinguishing
between these two species is a major challenge in the non-flowering
stage. This issue has an unforeseeable subsequent effect on the herb
“Xinba”, jeopardizing its clinical use, potency, and safety.
In recent years, researchers have shifted their focus from morphological
and chemical identification to using DNA-based molecular markers as a
precise method to assess the authenticity of medicinal herbs. DNA
barcodes has become particularly popular for identifying species of
Chinese medicinal herbs owing to its accuracy and speed (Zhu, Liu, Qiu,
Dai, & Gao, 2022). The universal barcodes include ITS, matK,rbc L, and trn H–psb A, either individually or in
combination (CBOL Plant Working Group et al., 2009; China Plant BOL
Group et al., 2011). However, these barcodes might be ineffective for
complex taxonomic groups, especially for radically evolved and closely
related taxa, because sufficient genetic variation is lacking (Z. F. Liu
et al., 2022). Therefore, more reliable specific DNA barcodes are needed
to distinguish the C. daurica from its adulterant C.
mongolica, and ensure the authenticity of the herb “Xinba”.
Here, we provide the chloroplast genome sequences of these twoCymbaria species and then conducted a comparative analysis of
these two newly sequenced chloroplast genomes with those of 52 other
Orobanchaceae species. Our specific objectives were to (1) characterizeCymbaria chloroplast genomes, (2) develop the specific molecular
markers as DNA barcodes, and (3) investigate the evolution and phylogeny
of Cymbaria . These findings will provide key insights on the
taxonomic identification, phylogenetic placement and reductive evolution
of hemiparasitic genus Cymbaria , and valuable genetic tools to
validate the authenticity of the traditional Mongolian medicine
“Xinba.”
2. | Materials and Methods
2.1. | Sampling, sequencing, assembly, and annotation
Wild C. mongolica and C. daurica plants were collected
from Binzhou, Shaanxi Province (35°14′38.63″N, 108°13′06.11″E) and
Xilingol, Inner Mongolia Autonomous Region (43°28′13.92″N,
116°47′07.76″E) in China. One sample of young fresh leaves per species
was collected in a liquid nitrogen tank. The habitat and altitude were
also recorded. Voucher specimens were stored in the Herbarium of Inner
Mongolia University. Total genomic DNA was extracted using the modified
CTAB method (Allen, Flores-Vergara, Krasynanski, Kumar, & Thompson,
2006). Illumina sequencing platform was used to generate raw reads. The
assembly was performed using clean data through SPAdes v. 3.14.0
(Bankevich et al., 2012) and GetOrganelle (Jin et al., 2020). The
graphic visualization and assembly quality was evaluated by Bandage
(Wick, Schultz, Zobel, & Holt, 2015) and GetOrganelle (Jin et al.,
2020), respectively. Annotations were performed using GeSeq (Tillich et
al., 2017) and manually adjusted in Geneious (Kearse et al., 2012).
After identifying the boundaries by BLAST, the sequences were submitted
to GenBank and visualized as a single circular maps using the Organellar
Genome DRAW tool (Greiner, Lehwark, & Bock, 2019).
2.2. | Comparative analysis of chloroplast genomes
A comparative analysis of Orobanchaceae chloroplast genomes was
conducted including eight
autotrophs, 21 hemiparasites, and 25 holoparasites (Table S1). The
genome sizes, GC contents, and intact genes were investigated using
PhyloSuite (D. Zhang et al., 2020) and drawn with RadarMap package in R
software (Team, 2015). A heatmap of genes in the chloroplast genome was
generated using TBtools (Chen et al., 2020). The PAML package v4.0 was
used to calculate the nonsynonymous (Ka) and synonymous (Ks)
substitution rates and their ratio (ω = Ka/Ks) (Yang, 2007).
Contractions and expansions of the boundaries were detected using
Irscope (Amiryousefi, Hyvönen, & Poczai, 2018). The Mauve program was
used to align sequences against the Schwalbea americana reference
sequence (Darling, Mau, & Perna, 2010). CodonW
(http://codonw.sourceforge.net/) was used to calculate RSCU. REPuter
(Kurtz et al., 2001) was employed to detect repeat sequences and simple
sequence repeats (SSRs) were identified by MISA (Beier, Thiel, Münch,
Scholz, & Mascher, 2017).
2.3. | Development and validation of DNA barcodes
Structural comparisons were
conducted using mVISTA (Frazer, Pachter, Poliakov, Rubin, & Dubchak,
2004) with C. mongolica as a reference. Sliding window analysis
was conducted to identify hypervariable regions using DnaSP (Librado &
Rozas, 2009). Primer3web v 4.1.0 (https://primer3.ut.ee/ ) was
used to design DNA barcoding primers based on the hypervariable regions,
and these were verified using seven individuals from different regions
of each species (Table S4). PCR was conducted in 25 µL reactions with
12.5 µL of 2×EasyTaq PCR SuperMix, 1.0 µL of each primer (0.4 µM), 1 µL
of template DNA, and 9.5 µL of ddH2O. A SimpliAmp™
Thermal Cycler (Applied Biosystems, Carlsbad, CA, USA) was used to
conduct all PCR reactions with the following thermal cycling conditions:
94 ˚C for 5 min; 30 cycles of 94 ˚C for 30 s, a specific annealing
temperature (Tm) for 30 s, and 72 ˚C for 30 s; and 72 ˚C for 10 min.
Agarose gel electrophoresis (1.5%) was used to visualize PCR products.
The DNA fragments were purified and sequenced by Biomarker Technologies
Co., Ltd.
2.4. | Phylogenetic analyses and divergence time estimation
To identify early divergence
events within Orobanchaceae, phylogeny was inferred from 54
Orobanchaceae species with the two Paulowniaceae species Tectona
grandis (GenBank accession number: NC_020098) and Paulownia
tomentosa (GenBank accession number: MK875778) as outgroups.
Phylogenetic trees were generated
using three datasets: (1) complete chloroplast genome sequences; (2)
coding DNA sequences (CDSs); and (3) shared CDSs (rps 2,rps 7, rps 8, rps 14, rpl 2, rpl 16,rpl 36, and mat K). The standard phylogenetic workflow in
Phylosuite v1.2.2 (D. Zhang et al., 2020) was used as follows: alignment
of multiple sequences with MAFFT v7.313 (Katoh & Standley, 2013),
optimization of aligned regions with Gblocks v0.91b (Castresana, 2000),
concatenation of datasets with the Concatenate Sequence function,
selection of the optimal partitioning scheme with PartitionFinder2, and
phylogenetic inference with IQ-TREE v1.6.8 (Nguyen, Schmidt, Von
Haeseler, & Minh, 2015) and MrBayes v3.2.6 (Drummond, Suchard, Xie, &
Rambaut, 2012). Phylogeny was inferred using Maximum likelihood and
Bayesian inference methods. Phylogenetic trees visualized using iTOL v6
(Letunic & Bork, 2021).
BEAST v1.7 (Drummond et al.,
2012) was used to estimate divergence times based on the shared
concatenated CDSs. As no reliable fossil record is available, the crown
age for Orobanchaceae (C) was constrained as 56 ± 10 Mya using a normal
model in TimeTree (http://timetree.org/). The results of PartitionFinder
were used to determine the nucleotide substitution model of the unlinked
subsets. The following parameters were used in BEAUti software:
“Lognormal relaxed clock (Uncorrelated)” for “Clock model” and
“Speciation: Yule Process” for “Tree model.” Three independent MCMC
chains were carried out under the same parameters. Each MCMC was run for
50,000,000 generations, and sampling was conducted every 5,000
generations. Convergence was evaluated using Tracer v.1.6
(http://beast.bio.ed.ac.uk/Tracer). The LogCombiner program was used to
combine the three log files. The TreeAnnotator program was used to
generate the maximum clade credibility (MCC) tree. The effective sample
sizes in the combined three log outputs were greater than 200, which
indicated that the MCMC sampling was adequate. FigTree v1.4.3 (Rambaut,
2017) was used to visualize the MCC tree with 95% highest posterior
density intervals.
3. | Results
3.1. | Characteristics of Cymbaria chloroplast genomes
Illumina sequencing yielded more than 20 million bp clean reads. The
chloroplast genomes of C. mongolica and C. dauricadisplayed a typical quadripartite structure with sizes of 149,431 bp
(38.0% GC content) and 151,545 bp (38.2% GC content), respectively.
Each chloroplast genome comprised an LSC region (86,595 bp and 87,376
bp), an SSC region (16,962 bp and 17,825 bp), and two IR regions (22,937
bp and 23,172 bp). The GC content in the IR region (44.1%) was higher
than that in the LSC (35.9–36.2%) and SSC (32.2–32.7%) regions.