TAXONOMY CLASSIFICATION
Botany, Genomics, Phylogenetics, Taxonomy
1 | Introduction
Plastids serve as crucial organelles that carry out photosynthesis in green plants. Generally, autotrophic plants are highly conserved in the chloroplast genomes (Wicke, Schneeweiss, dePamphilis, Müller, & Quandt, 2011). By contrast, heterotrophic plants including mycoheterotrophs and parasites exhibit varying degrees of degeneration in their chloroplast genomes (Graham, Lam, & Merckx, 2017). The family Orobanchaceae is recognized as the largest single family of parasites, consisting of 101 genera and over 2100 species with autotrophic, hemiparasitic, and holoparasitic lifestyles (Byng et al., 2016; Nickrent, 2020). Orobanchaceae is thus considered as an excellent model to elaborate the parasitic plants evolution (Westwood et al., 2012). Most previous studies have focused on autotrophs and holoparasites (Xiaoqing Liu et al., 2019; Wicke & Naumann, 2018; Zeng et al., 2017). However, limited investigations have been done on hemiparasites (Xin Li et al., 2021; R. Zhang et al., 2020). The hemiparasitic tribe Cymbarieae is widely distributed in Eurasia and comprises approximately 20 species in five genera (Schneeweiss, 2013). Cymbarieae was originally thought to be sister to various parasitic lineages in the family Orobanchaceae (Bennett & Mathews, 2006; McNeal, Bennett, Wolfe, & Mathews, 2013), while analysis of the tribe Orobancheae suggests that this original classification merits reconsideration (Xi Li, Feng, Randle, & Schneeweiss, 2019). Furthermore, although chloroplast genomes of Schwalbea americana(Wicke et al., 2013) and Siphonostegia chinensis (Gao et al., 2019; Jiang et al., 2022) have been previously published, no systematic comparative analysis of chloroplast genomes of various groups within the tribe Cymbarieae has been conducted to date.
Cymbaria L. sensu stricto (i.e., excludingCymbochasma Endl.) is the type genus of the tribe Cymbarieaea and usually parasitizes on roots of Stipa (Poaceae) andCaragana (Fabaceae). It is now generally believed that this genus includes only two facultative hemiparasites (Zhao, 1999). C. mongolica Maxim. is endemic to the Loess Plateau of China, whileC. daurica L. is a characteristic species of Mongolian Plateau steppe. According to the Chinese Pharmacopoeia and the Mongolian Medicinal Materials Standard, C. daurica is the only original plant species for elaborating the traditional Mongolian medicine, which is referred to as “Kanba-Arong” in Mongolian and “Xinba” in Chinese. It is widely used for treating several diseases, including pruritus, psoriasis, fetotoxicity, impetigo, and diabetes (Zhang et al., 2013). Its historical application dates back to the 19th-century Mongolian pharmaceutical classic known asMengyaozhengdian (Fig. S1), which provides a systematic approach to its usage. The dried whole plant is included in several classical herbal formulations, such as “Siweixinbasan” and “Baweixinbasan.” Previous studies have shown that C. daurica contains 177 chemical components (Wu et al., 2020), and the most common are flavonoids (Z.-H. Li et al., 2014) and iridoid glycosides (Q. Wang, Bao, Hao, & Han, 2018). This herb has received much recent attention owing to its diverse bioactivities, including anti-diabetic (Gong et al., 2020), anti-inflammatory (J.-J. Guo, Liu, Zhu, Ren, & Liang, 2017), and anti-bacterial (Shi et al., 2020) activities. However, either accidental or intentional, the adulteration and substitution of C. dauricafrequently occurs in China and Mongolia, due to its minor morphological differences with C. mongolica (Hu, 2018; Liang, Liang, Yao, Liu, & Li, 2016). Specifically, C. daurica has densely white sericeous anther locules that are 4–4.5 mm and apically pilose, whereasC. mongolica has pilose anther locules that are 3–3.6 mm and glabrous apically or occasionally with few hairs; thus, distinguishing between these two species is a major challenge in the non-flowering stage. This issue has an unforeseeable subsequent effect on the herb “Xinba”, jeopardizing its clinical use, potency, and safety.
In recent years, researchers have shifted their focus from morphological and chemical identification to using DNA-based molecular markers as a precise method to assess the authenticity of medicinal herbs. DNA barcodes has become particularly popular for identifying species of Chinese medicinal herbs owing to its accuracy and speed (Zhu, Liu, Qiu, Dai, & Gao, 2022). The universal barcodes include ITS, matK,rbc L, and trn H–psb A, either individually or in combination (CBOL Plant Working Group et al., 2009; China Plant BOL Group et al., 2011). However, these barcodes might be ineffective for complex taxonomic groups, especially for radically evolved and closely related taxa, because sufficient genetic variation is lacking (Z. F. Liu et al., 2022). Therefore, more reliable specific DNA barcodes are needed to distinguish the C. daurica from its adulterant C. mongolica, and ensure the authenticity of the herb “Xinba”.
Here, we provide the chloroplast genome sequences of these twoCymbaria species and then conducted a comparative analysis of these two newly sequenced chloroplast genomes with those of 52 other Orobanchaceae species. Our specific objectives were to (1) characterizeCymbaria chloroplast genomes, (2) develop the specific molecular markers as DNA barcodes, and (3) investigate the evolution and phylogeny of Cymbaria . These findings will provide key insights on the taxonomic identification, phylogenetic placement and reductive evolution of hemiparasitic genus Cymbaria , and valuable genetic tools to validate the authenticity of the traditional Mongolian medicine “Xinba.”
2. | Materials and Methods
2.1. | Sampling, sequencing, assembly, and annotation
Wild C. mongolica and C. daurica plants were collected from Binzhou, Shaanxi Province (35°14′38.63″N, 108°13′06.11″E) and Xilingol, Inner Mongolia Autonomous Region (43°28′13.92″N, 116°47′07.76″E) in China. One sample of young fresh leaves per species was collected in a liquid nitrogen tank. The habitat and altitude were also recorded. Voucher specimens were stored in the Herbarium of Inner Mongolia University. Total genomic DNA was extracted using the modified CTAB method (Allen, Flores-Vergara, Krasynanski, Kumar, & Thompson, 2006). Illumina sequencing platform was used to generate raw reads. The assembly was performed using clean data through SPAdes v. 3.14.0 (Bankevich et al., 2012) and GetOrganelle (Jin et al., 2020). The graphic visualization and assembly quality was evaluated by Bandage (Wick, Schultz, Zobel, & Holt, 2015) and GetOrganelle (Jin et al., 2020), respectively. Annotations were performed using GeSeq (Tillich et al., 2017) and manually adjusted in Geneious (Kearse et al., 2012). After identifying the boundaries by BLAST, the sequences were submitted to GenBank and visualized as a single circular maps using the Organellar Genome DRAW tool (Greiner, Lehwark, & Bock, 2019).
2.2. | Comparative analysis of chloroplast genomes
A comparative analysis of Orobanchaceae chloroplast genomes was conducted including eight autotrophs, 21 hemiparasites, and 25 holoparasites (Table S1). The genome sizes, GC contents, and intact genes were investigated using PhyloSuite (D. Zhang et al., 2020) and drawn with RadarMap package in R software (Team, 2015). A heatmap of genes in the chloroplast genome was generated using TBtools (Chen et al., 2020). The PAML package v4.0 was used to calculate the nonsynonymous (Ka) and synonymous (Ks) substitution rates and their ratio (ω = Ka/Ks) (Yang, 2007). Contractions and expansions of the boundaries were detected using Irscope (Amiryousefi, Hyvönen, & Poczai, 2018). The Mauve program was used to align sequences against the Schwalbea americana reference sequence (Darling, Mau, & Perna, 2010). CodonW (http://codonw.sourceforge.net/) was used to calculate RSCU. REPuter (Kurtz et al., 2001) was employed to detect repeat sequences and simple sequence repeats (SSRs) were identified by MISA (Beier, Thiel, Münch, Scholz, & Mascher, 2017).
2.3. | Development and validation of DNA barcodes
Structural comparisons were conducted using mVISTA (Frazer, Pachter, Poliakov, Rubin, & Dubchak, 2004) with C. mongolica as a reference. Sliding window analysis was conducted to identify hypervariable regions using DnaSP (Librado & Rozas, 2009). Primer3web v 4.1.0 (https://primer3.ut.ee/ ) was used to design DNA barcoding primers based on the hypervariable regions, and these were verified using seven individuals from different regions of each species (Table S4). PCR was conducted in 25 µL reactions with 12.5 µL of 2×EasyTaq PCR SuperMix, 1.0 µL of each primer (0.4 µM), 1 µL of template DNA, and 9.5 µL of ddH2O. A SimpliAmp™ Thermal Cycler (Applied Biosystems, Carlsbad, CA, USA) was used to conduct all PCR reactions with the following thermal cycling conditions: 94 ˚C for 5 min; 30 cycles of 94 ˚C for 30 s, a specific annealing temperature (Tm) for 30 s, and 72 ˚C for 30 s; and 72 ˚C for 10 min. Agarose gel electrophoresis (1.5%) was used to visualize PCR products. The DNA fragments were purified and sequenced by Biomarker Technologies Co., Ltd.
2.4. | Phylogenetic analyses and divergence time estimation
To identify early divergence events within Orobanchaceae, phylogeny was inferred from 54 Orobanchaceae species with the two Paulowniaceae species Tectona grandis (GenBank accession number: NC_020098) and Paulownia tomentosa (GenBank accession number: MK875778) as outgroups. Phylogenetic trees were generated using three datasets: (1) complete chloroplast genome sequences; (2) coding DNA sequences (CDSs); and (3) shared CDSs (rps 2,rps 7, rps 8, rps 14, rpl 2, rpl 16,rpl 36, and mat K). The standard phylogenetic workflow in Phylosuite v1.2.2 (D. Zhang et al., 2020) was used as follows: alignment of multiple sequences with MAFFT v7.313 (Katoh & Standley, 2013), optimization of aligned regions with Gblocks v0.91b (Castresana, 2000), concatenation of datasets with the Concatenate Sequence function, selection of the optimal partitioning scheme with PartitionFinder2, and phylogenetic inference with IQ-TREE v1.6.8 (Nguyen, Schmidt, Von Haeseler, & Minh, 2015) and MrBayes v3.2.6 (Drummond, Suchard, Xie, & Rambaut, 2012). Phylogeny was inferred using Maximum likelihood and Bayesian inference methods. Phylogenetic trees visualized using iTOL v6 (Letunic & Bork, 2021).
BEAST v1.7 (Drummond et al., 2012) was used to estimate divergence times based on the shared concatenated CDSs. As no reliable fossil record is available, the crown age for Orobanchaceae (C) was constrained as 56 ± 10 Mya using a normal model in TimeTree (http://timetree.org/). The results of PartitionFinder were used to determine the nucleotide substitution model of the unlinked subsets. The following parameters were used in BEAUti software: “Lognormal relaxed clock (Uncorrelated)” for “Clock model” and “Speciation: Yule Process” for “Tree model.” Three independent MCMC chains were carried out under the same parameters. Each MCMC was run for 50,000,000 generations, and sampling was conducted every 5,000 generations. Convergence was evaluated using Tracer v.1.6 (http://beast.bio.ed.ac.uk/Tracer). The LogCombiner program was used to combine the three log files. The TreeAnnotator program was used to generate the maximum clade credibility (MCC) tree. The effective sample sizes in the combined three log outputs were greater than 200, which indicated that the MCMC sampling was adequate. FigTree v1.4.3 (Rambaut, 2017) was used to visualize the MCC tree with 95% highest posterior density intervals.
3. | Results
3.1. | Characteristics of Cymbaria chloroplast genomes
Illumina sequencing yielded more than 20 million bp clean reads. The chloroplast genomes of C. mongolica and C. dauricadisplayed a typical quadripartite structure with sizes of 149,431 bp (38.0% GC content) and 151,545 bp (38.2% GC content), respectively. Each chloroplast genome comprised an LSC region (86,595 bp and 87,376 bp), an SSC region (16,962 bp and 17,825 bp), and two IR regions (22,937 bp and 23,172 bp). The GC content in the IR region (44.1%) was higher than that in the LSC (35.9–36.2%) and SSC (32.2–32.7%) regions.