Materials and Methods
Plant materials. Stem cuttings of C. rotundifolia were collected from Endau hill, Kitui County, Kenya (01°17’49” S, 038°31’59” E). Voucher specimens (JAJIT-MU0128) were deposited in the Wuhan Botanical Garden herbarium (HIB). Young leaves of C. rotundifoliawere collected for genome size evaluation and DNA isolation. Root tips were used for chromosome number determination. Matured leaves, stems, and young roots were collected for RNA isolation and tissue-specific transcriptome analysis.
C. rotundifolia individuals were grown in a glasshouse under artificial conditions (16 h light / 8 h dark, 25 °C) in 3 L pots. After well watering, plants were cultivated for one week and transferred to an incubator with 12 h light (from 6:00 to 18:00) at 30 °C and 12 h dark (18:00 to 6:00) at 20 °C (30 % humidity). After three days, leaves were collected from 18:00 on 9 January 2020 to 18:00 on 10 January 2020 at three-hour intervals. We selected nine time points including 18:00, 21:00, 24:00, 3:00, 6:00, 9:00, 12:00, 15:00, and 18:00 for the study. Three biological replicates were collected for each sample and immediately frozen in liquid nitrogen. Samples were stored at -80 °C for RNA extraction and titratable acid measurement.
Estimation of genome size. About 28.31 Gb Illumina reads were used to evaluate the genome size of C. rotundifolia byK-mer analysis. The K-mer frequency distribution was calculated using Jellyfish (v2.2.6) (Marcais and Kingsford, 2011) with the K-mer length of 19. Genomic heterozygosity was estimated by GenomeScope (v2.0) (Ranallo-Benavidez et al., 2020). Flow cytometry was used to determine the nuclear DNA content using hand-chopped materials as described by Galbraith et al. (1983), with minor modifications. Woody plant buffer (WPB) was used instead of Tris-MgCl2 buffer to isolate the nuclei (Loureiro et al., 2007). Raphanus sativuscv. Saxa was used as the reference standard. Young leaves of C. rotundifolia and R. sativus were collected, and the protocols were described in detail by Gichuki et al., 2019.
Chromosome count. Root tips were treated with a saturated solution of 1-Bromonaphthalene for 3 h at room temperature (25–28 °C) to halt cell division. Microscopic slides were prepared from the treated root tips using the protocol developed by Kirov et al. (2014) and Gichuki et al. (2019). The prepared microscopic slides were stained with 4’, -6-diamidino-2-phenylindole (DAPI), and images were captured using a fluorescence microscope (Leica DMi8) fitted with a camera (Leica DFC 550).
Genome sequencing. Genomic DNA of C. rotundifolia was extracted using CTAB-based protocol as described by Doyle & Doyle (1987), with minor modifications. Briefly, a washing step was included before CTAB extraction to exclude secondary metabolites. The washing buffer contained 50 mM Tris-HCL, 5 mM EDTA-2Na, 0.35 M D-sorbitol, 1% (W/V) polyvinyl pyrrolidone (PVP-K 30), and 1 % 2-hydroxy-1-ethanethiol. Illumina libraries with 450 bp insertions were constructed according to the Illumina standard protocol, and the paired-end libraries were sequenced to about 77 × coverage on the Illumina Hiseq platform. Approximate 20 kb SMRTbell libraries were prepared and sequenced on the PacBio Sequel system following standard protocols (Berry Genomics Corporation, Beijing, China).
Hi-C library construction and sequencing. Young leaves from the same C. rotundifolia plant used for genome sequencing were collected for Hi-C analysis. The Hi-C library construction and sequencing for chromosome-level assembly were implemented by Biomarker Technologies Corporation (Beijing, China) as a previously published method (Xie et al., 2015). Briefly, the shoot tips of C. rotundifolia plants were covered with a black box, and the etiolated leaves were fixed with formaldehyde and lysed. The cross-linked DNA was digested overnight using HindIII. Digested fragments were biotinylated and ligated to form chimeric junctions that were enriched, sheared, and processed. Then the libraries were produced on the Illumina Hiseq platform. The paired-end Hi-C reads were uniquely mapped onto the contigs using Juicer (Durand et al., 2016b), and the non-duplicate mapped results were used as the input for the 3D-DNA pipeline (Dudchenko et al., 2017) to construct the genome sequence. Two rounds (-r 2) provided the best results with an assembly of 12 pseudo-chromosomes and N50 of ~28 Mb. Fine-tuning assembled genome in a graphic and inter-action matrix was drawn in Juicebox (Durand et al., 2016a).
Genome assembly. The contig-level assembly of C. rotundifolia genome was processed by combining ~100 X PacBio long reads and ~76 X Illumina short reads. The raw PacBio reads were initially corrected and trimmed by CANU-1.8 (Koren et al., 2017) with the parameters of genomeSize = 600m, useGrid = false, maxMemory = 200g, ovsMemory = 16G, ovbConcurrency = 15, ovsConcurrency = 15 and ”batOptions = -dg 3 -db 3 -dr 1 -ca 500 -cp 50”. Then, the corrected PacBio reads were assembled by two widely used PacBio assemblers, CANU-1.8 (Koren et al. , 2017), and WTDBG2 (Ruan and Li, 2019). We used N50 and the genome size of each assembler to inspect assembly quality. The Canu assembled results were adopted in the end. Illumina reads were used to further polish PacBio assembly using Pilon (Wang et al., 2014) program with parameters: -verbose -mindepth 4 -fix snps,indels -vcf. Then, the redundancy of the assembled sequences was removed to improve the continuity of the assembled contigs using Redundans (Pryszcz and Gabaldón, 2016) with parameters: -identity 0.5 -overlap 0.66 and Purge Haplotigs (Roach et al., 2018) with parameters: -l 15 -m 60 -h 190.
Evaluation of the genome assembly. The genome assembly quality of the C. rotundifolia was evaluated using two methods. Firstly, the error rate considering the homozygous mutation was estimated by mapping the 28.31 Gb whole genome sequence (WGS) reads onto this assembly by BWA software. Compared to eudicotyledons_odb10 database, the single copy orthologs in the assembled genome were identified and completeness of the assembly was evaluated using Benchmarking Universal Single-Copy Orthologs v2 (Simão et al., 2015) with the -long and default parameter, respectively.
Repeat sequences annotation. Repeat structures were analyzed by a combined strategy of the de novo prediction and homology-based prediction. A de novo repeat library of C. rotundifolia genome was built by RepeatModeler (v1.0.11, http://www.repeatmasker.org/RepeatModeler/) with the -engine ncbi parameter. Using this library, we processed repetitive sequences to annotate, classify, and mark by RepeatMasker (v4.0.7,http://www.repeatmasker.org/). Two built libraries were combined with Repbase (v20170127, https://www.girinst.org/) (Jurka et al., 2005) and Dfam (v20170127,http://www.dfam.org/) (Hubley et al., 2016) databases with default parameters. SSRs were identified using MISA (http://pgrc.ipk-gatersleben.de/misa/misa.html) (Thiel et al., 2003), with the unit lengths ranging from 1 to 7 and the min-length set to 10 bp. LTR were identified by LTR_retriever according to the method of Ou & Jiang (Ou and Jiang, 2018).
Gene model prediction. Three approaches were combined to annotate protein-coding genes. Firstly, Illumina RNA-seq data from three representative tissues were assembled with two different strategies (de novo or genome-guided assembly) by Trinity (v2.2.0) (Haas et al., 2013). The assembled RNA-seq data were then aligned to the assembled genome and evidence-based prediction by PASA (v2.0.2) (Haas et al., 2003). Secondly, the ab initio methods, AUGUSTUS (v3.3) (Stanke et al., 2006), SNAP (Korf, 2004), and GeneMarkHMM (Lukashin and Borodovsky, 1998) (v 4.32) with default parameters were used to predict gene models with the training of the best candidate genes obtained from PASA (Haas et al., 2003). Thirdly, protein sequences from closely related species, including V. vinifera (PN40024) and other Cissusspecies downloaded from NCBI, were used to annotate protein homologs ofC. rotundifolia by GenomeThreader (https://genomethreader.org/). Finally, the annotation results generated from evidence-based prediction, ab initio prediction, and homologous mapping were combined by EVM (v 1.1.1) (Haas et al., 2008) to integrate the consensus gene model, and genes were renamed according to their position in the genome sequence with the prefix of CRGY (C. rotundifolia genome).
Phylogenomic tree construction and gene family analyses. The protein-coding sequences from C. rotundifolia and 13 other representative species were used to identify orthologous groups, including those of Arabidopsis thaliana , Oryza sativa ,Vitis vinifera , Actinidia chinensis , Coffeaarabica, Solanum lycopersicum , Populous thichocarpa ,Theobroma cacao , Carica papaya , Citrus sinensis ,Fragaria ananassa , Malus domestica , and Prunus pesica . All-vs-all BLASTP (Camacho et al., 2009) with an e-value cutoff of 1e-05 was performed, and orthologous genes were clustered using OrthoMCL (Li, 2003). Single copy genes were extracted from the clustering results and performed multiple sequence alignments using MUSCLE (v3.8.31) (Edgar, 2004). After removing low-quality alignment or divergent regions by Gblocks, high-quality aligned protein sequences remained. All aligned sequences were concatenated to one long sequence for each species, and these sequences were used to construct a phylogenetic tree by RAxML (v2.5.1) (Stamatakis, 2006) with PROTGAMMAJTTF model and bootstrap of 1000. MCMCtree (4.8a) from the PAML package (Yang, 2007) was adopted to estimate the species divergence time according to TimeTree (http://www.timetree.org). Four divergence times were used in this analysis, including Coffea arabica andSolanum lycopersicum , Arabidopsis thaliana andCarica papaya , Prunus persica and Malus domestica ,Fragaria × ananassa and Prunus persica . And the divergence times of Vitis vinifera and C. rotundifolia in Timetree were also referred in this study. The Markov chain Monte Carlo (MCMC) process analysis was set for 50,000 generations and 50,000 burn-in iterations. The OrthMCL results and time divergence tree were used as the input for CAFÉ (v3.1) program (Han et al., 2013), which was used to identify expansions and contractions of gene families across 15 plant genomes. The family expansion and contraction were analyzed by Count, and the methods and parameters were according to the study of theAmborella genome (Albert et al., 2013). Multi-species orthologous clusters with gene numbers greater than 0 in V. vinifera andC. rotundifolia were considered orthologous groups between these two species. Expanded orthogroups were defined according to theirp -value less than 0.05 and the gene number greater than the average value of multi-species. Dot plot representation of orthologous groups was performed with the R package ggplot2 (http://ggplot2.org/) (Wickham, 2011).
Tandem duplication analysis. Four additional typical succulent species, including Ananas comosus, Hylocereus undatus, Kalanchoe fedtschenkoi, and Kalanchoe laxiflora were added to the gene family clustering (Table S1). Multi-species orthologous clusters with a gene number greater than 0 in V. vinifera andC. rotundifolia were used to identify the lineage-specific expansion. Expanded orthogroups were identified with a p -value less than 0.05 and the gene number greater than the average value of multi-species. The tandem genes were identified by MCScanX (Wang et al., 2012), which was consistent with the method described in our WGD analysis. The gained TD genes of 18 species were obtained from the Count results and were further categorized into either co-expanded or lineage-specific expanded ones. The TD genes were GO termed by agriGO database (http://systemsbiology.cau.edu.cn/agriGOv2/index.php). Further, four succulent plants were annotated both by GO database and agriGO database.
Synteny analyses. All-vs-all BLASTP (Camacho et al., 2009) (e-value 1e-05) and MCScanX (Wang et al., 2012) was used to predict the collinear relationships and positional features between C. rotundifolia and V. vinifera (PN40024). Blocks larger than ten genes and gaps less than five genes were obtained. The synteny map and dotplot were processed by MCScan and drawn by the python scripts in MCScan packages (Tang et al., 2008).
The segment duplication events were predicted using self-vs-self BLASTP (Camacho et al., 2009) (e-value 1e-05) and MCScanX among the C. rotundifolia genome, requiring at least five genes per collinear block. Subsequently, the pairwise sequences from the synteny blocks and segment duplication pairs were processed by ParaAT (v2.0) (Zhang et al., 2012). The values of nonsynonymous mutation rate (Ka ) and synonymous mutation rate (Ks ) were calculated using the NG estimation method in Kaks_Calculator (v2.0) (Wang et al., 2010). The visualization plots of the Ks distribution were made using a custom R script. Additionally, whole-genome duplication (WGD) events were determined by the distribution of Ks of segment duplication pairs and identified by comparisons with the events of V. vinifera(PN40024). The AEK was inferred from the genomes of eudicot species with the smallest numbers of historical polyploidization events, including grape, cacao, and peach. Further, the AEK was refined as a post-τ AMK with ten protochromosomes and 13,916 ordered protogenes, a pre-τ AMK with five protochromosomes and 6,707 ordered protogenes (Murat et al., 2017). In the current study, the reconstruction of karyotype of theV. vinifera and C. rotundifolia were advised by a previous study by Florent Murat (Murat et al., 2017), and the genes and gene orders were used to construct the seven chromosomes and 21 chromosomes of AEK. To cover as many genes as possible, we used version 2.1 of the grape assembly, which anchored 32,424 coding genes (Table S1). MCScanX (Wang et al., 2012), in a BLASTP and dotplot-based approach, was used to detect the syntenic blocks between C. rotundifolia vs. AEK andV. vinifera vs. AEK with default parameters. The protein of the pre-γAEK and post-γAEK compared to Cissus and grape by BLASTP. The syntenic blocks were ordered according to the gene order of C. rotundifolia and V. vinifera. Some small syntenic blocks and small gaps were abandoned or closed to make the syntenic segments more complete. On the base of dotplot illustrations of the synteny between these two species, the karyotypic structures of the ancestral eudicots were explained by taking into account the fewest number of genomic rearrangements, which may have occurred between the AEK and modern eudicot genomes (Figure S5, Tables S2 and S3).
Detection of significant expansion and contraction in succulent plants. To investigate the significant expansion or contraction of gene families, we divided 18 species into two categories, including five succulent plants (S5) and 13 non-succulent/others plants (O13). Five succulent plants, including Cissus rotundifolia , Ananas comosus , Kalanchoe laxiflora , Hylocereus undatus ,Kalanchoe fedtschenkoi and other 13 plants were described in Table S1. The average number of genes per orthogroup between two categories was available to evaluate the significant events. For S5 plants, a binomial test with a probability of success of p(W) = 5/18 was used. The criteria of significant expansion or contraction are as follows: 1, A statistically false discovery rate-adjusted p -value < 0.05 from the initial set of 97,344 orthogroups; 2, The minimal contribution of about three for S5 and seven for O13 species; 3, Contribute gene per orthogroup on average satisfied with (S5n/5) / (O13n/13) > 1. We found that 88 orthogroups were expanded (corresponding to 5,696 genes), and 178 were contracted in succulent plants relative to non-succulent plants.
Measurement of titratable acidity. The diurnal changes of titratable acid in leaves of C. rotundifolia were measured as described by Chen et al. (1983). The samples were collected as mentioned above. A total of 0.5 g leaves of each sample were cut into pieces, placed in centrifugal tubes, and boiled for 30 min after adding 10 mL CO2-free distilled water. The supernatant after centrifugation was reserved. Additional 10 mL CO2-free water was added to the pellet to extract and centrifuge again. Total supernatants obtained by the two-stage extraction process were titrated to pH 8.3 with 0.01 mol/L NaOH, and the acidity of the leaf was represented as μeq of acid per gram fresh weight (μeq g-1 FW).
RNA extraction and RNA-Seq library preparation. Total RNA was extracted from the samples using the Universal Plant Total RNA Fast Extraction Kit (BioTeke Corporation, Beijing, China). RNase-free DNase I was used to remove DNA from the extracted RNA. The purity and concentration of RNA were determined by a Nano Drop and Agilent 2100 bioanalyzer (Thermo Fisher Scientific, MA, USA). Subsequently, mRNA enriched by Oligo (dT) - attached magnetic beads was randomly fragmented into short pieces with an additional fragmentation buffer. Then, first-strand cDNA was synthesized by random hexamer-primed reverse transcription, followed by second-strand cDNA synthesis. A-Tailing Mix and RNA Index Adapters were added by incubating to end repair. The obtained cDNA fragments were amplified by PCR, and then products were purified by Ampure XP Beads. Agilent Technologies 2100 bioanalyzer was used for quality control of products. Finally, the cDNA library was constructed, and the MGISEQ-2000 platform was used for paired-end sequencing (2 x 150 bp). Approximately 40 million bp were generated for each sample.
Transcriptome analysis. The quality of paired-end raw transcriptome data was checked by FastQC v0.11.8 and trimmed using Trimmomatic (v0.36) (Bolger et al., 2014). Then the trimmed reads were mapped onto C. rotundifolia latest assembled genome through TopHat (v2.1.1) (Trapnell et al., 2012). Using the gene model ofC. rotundifolia , the expression levels of genes represented by FPKM (Fragments per Kilobase Million) for each sample were calculated by Cufflinks (v2.2.1) (Trapnell et al., 2012) with default parameters. The genes involved in the stomatal movement process and CAM pathway were picked to show their expression patterns by ‘pheatmap’ package in R.
Identification of CAM pathway and stomatal movement process-related genes. The genomes of Ananas comosus , V. vinifera (PN40024), O. sativa , Zea mays , andPhalaenopsis equestris were downloaded from Pineapple Genomics Database (PGD, http://pineapple.angiosperms.org/pineapple/html/index.html) (Xu et al., 2018), Phytozome database (https://phytozome.jgi.doe.gov/pz/portal.html) (Goodstein et al., 2012), Rice Genome Annotation Project (http://rice.plantbiology.msu.edu/) (Kawahara et al., 2013), MaizeGDB (https://maizegdb.org/) (Portwood et al., 2019), and NCBI (https://www.ncbi.nlm.nih.gov/), respectively. Further, the CAM gene list of Kalanchoë was obtained from a supplemental table of its genome (Yang et al., 2017). The list of gene families, which includedcarbonic anhydrase (CA ), phosphoenolpyrvuate carboxylase (PEPC ), phosphoenolpyruvate carboxylase kinase (PEPCK ), malate dehydrogenase (MDH ),malic enzyme (ME ), phosphoenolpyruvate carboxykinase (PPCK ), and pyruvate phosphate dikinase regulatory protein (PPDKRP ), was obtained from PGD (Xu et al., 2018). All given gene sequences in each family from pineapple, O. sativa , and Z. mays , Phalaenopsis equestris andKalanchoë were used as queries to search corresponding family members in C. rotundifolia and V. vinifera (PN40024) by BLASTP. The genes with alignment length > 100bp and e-value < 1e-05 were considered as potential members. Then online software CD-search (https://www.ncbi.nlm.nih.gov/cdd) (Marchler-Bauer and Bryant, 2004) and PFAM (https://pfam.xfam.org/) (El-Gebali et al., 2019) were used to detect the specific domain. The genes without a unique domain of gene family were abandoned. Then the remaining genes were defined as candidate members and used for further analyses. Diel expression dataset of Arabidopsis C3 leaf (Mockler et al., 2007) and pineapple CAM leaf were used to compare with CAM-genes shown in Figure 4b in C. rotundifolia , whose orthologs were identified by BLASTP based on sequence similarity, and then, gene-pairs between two species were used to calculate their relationship (Pearson and Spearman) of transcript expression (Table S4). On the base of satisfying two correlation coefficients (Pearson and Spearman), genes (Rcr-at < 0.5) were determined as not correlative expression patterns during a day/night cycle betweenCissus and Arabidopsis . Gene pairs (Rcr-at< 0.5 and Rcr-ac > 0.8) were defined strongly CAM genes. The genes for stomatal movement were identified using BLASTP with an Evalue cutoff of 1e-5 based on orthology in Arabidopsis as described by Chen et al. (2020).
Co-expression network and cluster analysis . Transcripts with average FPKM > 1 (calculated from three biological replicates) in at least one of the nine samples were used to construct a weighted gene co-expression network by R package WGCNA. The transcript expression was log2 transformed. Modules were constructed using the following parameters: power  = 16, networkType  =  ”signe”, mergeCutHeight  =  0.18, corType  = ”bicor”, minModuleSize = 30. All the nine time point transcripts with three replicates were used to perform cluster analysis by maSigPro package. The parameters were as following: degree = 3, counts = F, MT.adjust = ”BH”. Transcripts were marked as influential by the T.fit() function. Genes with “non-flat” significantly changed across the nine time points. Nine clusters were displayed using the ”see.genes” function with cluster.method=”hclust”, k=9 in maSigPro. The network of each cluster was constructed by ARACNE algorithm with ‘Discovery’ mode and ‘Naïve bayes’ mutual information (MI) algorithm type in Cytoscape software. The p -value was calculated based on MI, in which, less than 0.05 were selected in each cluster. One percent of genes with at least ten edges in each network were selected by cytoHubba, and CAM genes also were chosen based on a minimum of ten directed edges.
Cis-element annotation and enrichment analysis of CAM related genes. Promoter sequences in 2kb upstream of genes involved in CAM were extracted from the C. rotundifolia genome. Of all the promoter sequences, the cis -element enrichment of light, circadian, temperature, and drought in CAM-related and stomatal movement-related promoters were implemented by FIMO (Grant et al., 2011) program with p -value < 0.0002 in MEME. Enrichment analysis of about five known cis -elements including the morning element (CCACAC), the evening element (AAAATATCT), the CCA1-binding site (AAAAATCT), the TCP15 element (NGGNCCCAC), and the G-box element (G-box; CACGTG) (Michael and McClung, 2002) were performed by FIMO (Grant et al., 2011) program.