Materials and Methods
Plant materials. Stem cuttings of C. rotundifolia were
collected from Endau hill, Kitui County, Kenya (01°17’49” S, 038°31’59”
E). Voucher specimens (JAJIT-MU0128) were deposited in the Wuhan
Botanical Garden herbarium (HIB). Young leaves of C. rotundifoliawere collected for genome size evaluation and DNA isolation. Root tips
were used for chromosome number determination. Matured leaves, stems,
and young roots were collected for RNA isolation and tissue-specific
transcriptome analysis.
C. rotundifolia individuals were grown in a glasshouse under
artificial conditions (16 h light / 8 h dark,
25 °C) in 3 L pots. After well
watering, plants were cultivated for one week and transferred to an
incubator with 12 h light (from 6:00 to 18:00) at 30 °C and 12 h dark
(18:00 to 6:00) at 20 °C (30 % humidity). After three days, leaves were
collected from 18:00 on 9 January 2020 to 18:00 on 10 January 2020 at
three-hour intervals. We selected nine time points including 18:00,
21:00, 24:00, 3:00, 6:00, 9:00, 12:00, 15:00, and 18:00 for the study.
Three biological replicates were collected for each sample and
immediately frozen in liquid nitrogen. Samples were stored at -80 °C for
RNA extraction and titratable acid measurement.
Estimation of genome size. About 28.31 Gb Illumina reads were
used to evaluate the genome size of C. rotundifolia byK-mer analysis. The K-mer frequency distribution was
calculated using Jellyfish (v2.2.6) (Marcais and Kingsford, 2011) with
the K-mer length of 19. Genomic heterozygosity was estimated by
GenomeScope (v2.0) (Ranallo-Benavidez et al., 2020). Flow cytometry was
used to determine the nuclear DNA content using hand-chopped materials
as described by Galbraith et al. (1983), with minor modifications. Woody
plant buffer (WPB) was used instead of Tris-MgCl2 buffer
to isolate the nuclei (Loureiro et al., 2007). Raphanus sativuscv. Saxa was used as the reference standard. Young leaves of C.
rotundifolia and R. sativus were collected, and the protocols
were described in detail by Gichuki et al., 2019.
Chromosome count. Root tips were treated with a saturated
solution of 1-Bromonaphthalene for 3 h at room temperature (25–28 °C)
to halt cell division. Microscopic slides were prepared from the treated
root tips using the protocol developed by Kirov et al. (2014) and
Gichuki et al. (2019). The prepared microscopic slides were stained with
4’, -6-diamidino-2-phenylindole (DAPI), and images were captured using a
fluorescence microscope (Leica DMi8) fitted with a camera (Leica DFC
550).
Genome sequencing. Genomic DNA of C. rotundifolia was
extracted using CTAB-based protocol as described by Doyle & Doyle
(1987), with minor modifications. Briefly, a washing step was included
before CTAB extraction to exclude secondary metabolites. The washing
buffer contained 50 mM Tris-HCL, 5 mM EDTA-2Na, 0.35 M D-sorbitol, 1%
(W/V) polyvinyl pyrrolidone (PVP-K 30), and 1 %
2-hydroxy-1-ethanethiol. Illumina libraries with 450 bp insertions were
constructed according to the Illumina standard protocol, and the
paired-end libraries were sequenced to about 77 × coverage on the
Illumina Hiseq platform. Approximate 20 kb SMRTbell libraries were
prepared and sequenced on the PacBio Sequel system following standard
protocols (Berry Genomics Corporation, Beijing, China).
Hi-C library construction and sequencing. Young leaves from the
same C. rotundifolia plant used for genome sequencing were
collected for Hi-C analysis. The Hi-C library construction and
sequencing for chromosome-level assembly were implemented by Biomarker
Technologies Corporation (Beijing, China) as a previously published
method (Xie et al., 2015). Briefly, the shoot tips of C.
rotundifolia plants were covered with a black box, and the etiolated
leaves were fixed with formaldehyde and lysed. The cross-linked DNA was
digested overnight using HindIII. Digested fragments were biotinylated
and ligated to form chimeric junctions that were enriched, sheared, and
processed. Then the libraries were produced on the Illumina Hiseq
platform. The paired-end Hi-C reads were uniquely mapped onto the
contigs using Juicer (Durand et al., 2016b), and the non-duplicate
mapped results were used as the input for the 3D-DNA pipeline (Dudchenko
et al., 2017) to construct the genome sequence. Two rounds (-r 2)
provided the best results with an assembly of 12 pseudo-chromosomes and
N50 of ~28 Mb. Fine-tuning assembled genome in a graphic
and inter-action matrix was drawn in Juicebox (Durand et al., 2016a).
Genome assembly. The contig-level assembly of C.
rotundifolia genome was processed by combining ~100 X
PacBio long reads and ~76 X Illumina short reads. The
raw PacBio reads were initially corrected and trimmed by CANU-1.8 (Koren
et al., 2017) with the parameters of genomeSize = 600m, useGrid = false,
maxMemory = 200g, ovsMemory = 16G, ovbConcurrency = 15, ovsConcurrency =
15 and ”batOptions = -dg 3 -db 3 -dr 1 -ca 500 -cp 50”. Then, the
corrected PacBio reads were assembled by two widely used PacBio
assemblers, CANU-1.8 (Koren et al. , 2017), and WTDBG2 (Ruan and
Li, 2019). We used N50 and the genome size of each assembler to inspect
assembly quality. The Canu assembled results were adopted in the end.
Illumina reads were used to further polish PacBio assembly using Pilon
(Wang et al., 2014) program with parameters: -verbose -mindepth 4 -fix
snps,indels -vcf. Then, the redundancy of the assembled sequences was
removed to improve the continuity of the assembled contigs using
Redundans (Pryszcz and Gabaldón, 2016) with parameters:
-identity 0.5 -overlap 0.66 and Purge Haplotigs (Roach et al., 2018)
with parameters: -l 15 -m 60 -h 190.
Evaluation of the genome assembly. The genome assembly quality
of the C. rotundifolia was evaluated using two methods. Firstly,
the error rate considering the homozygous mutation was estimated by
mapping the 28.31 Gb whole genome sequence (WGS) reads onto this
assembly by BWA software. Compared to eudicotyledons_odb10 database,
the single copy orthologs in the assembled genome were identified and
completeness of the assembly was evaluated using Benchmarking Universal
Single-Copy Orthologs v2 (Simão et al., 2015) with the -long and default
parameter, respectively.
Repeat sequences annotation. Repeat structures were analyzed by
a combined strategy of the de novo prediction and homology-based
prediction. A de novo repeat library of C. rotundifolia genome
was built by RepeatModeler (v1.0.11,
http://www.repeatmasker.org/RepeatModeler/) with the -engine ncbi
parameter. Using this library, we processed repetitive sequences to
annotate, classify, and mark by RepeatMasker (v4.0.7,http://www.repeatmasker.org/).
Two built libraries were combined with Repbase (v20170127,
https://www.girinst.org/) (Jurka et al., 2005) and Dfam (v20170127,http://www.dfam.org/) (Hubley et
al., 2016) databases with default parameters. SSRs were identified using
MISA
(http://pgrc.ipk-gatersleben.de/misa/misa.html)
(Thiel et al., 2003), with the unit lengths ranging from 1 to 7 and the
min-length set to 10 bp. LTR were identified by LTR_retriever according
to the method of Ou & Jiang (Ou and Jiang, 2018).
Gene model prediction. Three approaches were combined to
annotate protein-coding genes. Firstly, Illumina RNA-seq data from three
representative tissues were assembled with two different strategies (de
novo or genome-guided assembly) by Trinity (v2.2.0) (Haas et al., 2013).
The assembled RNA-seq data were then aligned to the assembled genome and
evidence-based prediction by PASA (v2.0.2) (Haas et al., 2003).
Secondly, the ab initio methods, AUGUSTUS (v3.3) (Stanke et al., 2006),
SNAP (Korf, 2004), and GeneMarkHMM (Lukashin and Borodovsky, 1998) (v
4.32) with default parameters were used to predict gene models with the
training of the best candidate genes obtained from PASA (Haas et
al., 2003). Thirdly, protein sequences from closely related
species, including V. vinifera (PN40024) and other Cissusspecies downloaded from NCBI, were used to annotate protein homologs ofC. rotundifolia by GenomeThreader (https://genomethreader.org/).
Finally, the annotation results generated from evidence-based
prediction, ab initio prediction, and homologous mapping were combined
by EVM (v 1.1.1) (Haas et al., 2008) to integrate the consensus gene
model, and genes were renamed according to their position in the genome
sequence with the prefix of CRGY (C. rotundifolia genome).
Phylogenomic tree construction and gene family analyses. The
protein-coding sequences from C. rotundifolia and 13 other
representative species were used to identify orthologous groups,
including those of Arabidopsis thaliana , Oryza sativa ,Vitis vinifera , Actinidia chinensis , Coffeaarabica, Solanum lycopersicum , Populous thichocarpa ,Theobroma cacao , Carica papaya , Citrus sinensis ,Fragaria ananassa , Malus domestica , and Prunus
pesica . All-vs-all BLASTP (Camacho et al., 2009) with an e-value cutoff
of 1e-05 was performed, and orthologous genes were clustered using
OrthoMCL (Li, 2003). Single copy genes were extracted from the
clustering results and performed multiple sequence alignments using
MUSCLE (v3.8.31) (Edgar, 2004). After removing low-quality alignment or
divergent regions by Gblocks, high-quality aligned protein sequences
remained. All aligned sequences were concatenated to one long sequence
for each species, and these sequences were used to construct a
phylogenetic tree by RAxML (v2.5.1) (Stamatakis, 2006) with
PROTGAMMAJTTF model and bootstrap of 1000. MCMCtree (4.8a) from the PAML
package (Yang, 2007) was adopted to estimate the species divergence time
according to TimeTree (http://www.timetree.org). Four divergence times
were used in this analysis, including Coffea arabica andSolanum lycopersicum , Arabidopsis thaliana andCarica papaya , Prunus persica and Malus domestica ,Fragaria × ananassa and Prunus persica . And the divergence
times of Vitis vinifera and C. rotundifolia in Timetree
were also referred in this study. The Markov chain Monte Carlo (MCMC)
process analysis was set for 50,000 generations and 50,000 burn-in
iterations. The OrthMCL results and time divergence tree were used as
the input for CAFÉ (v3.1) program (Han et al., 2013), which was used to
identify expansions and contractions of gene families across 15 plant
genomes. The family expansion and contraction were analyzed by Count,
and the methods and parameters were according to the study of theAmborella genome (Albert et al., 2013). Multi-species orthologous
clusters with gene numbers greater than 0 in V. vinifera andC. rotundifolia were considered orthologous groups between these
two species. Expanded orthogroups were defined according to theirp -value less than 0.05 and the gene number greater than the
average value of multi-species. Dot plot representation of orthologous
groups was performed with the R package ggplot2
(http://ggplot2.org/) (Wickham,
2011).
Tandem duplication analysis. Four additional typical succulent
species, including Ananas comosus, Hylocereus undatus,
Kalanchoe fedtschenkoi, and Kalanchoe laxiflora were added to
the gene family clustering (Table S1). Multi-species orthologous
clusters with a gene number greater than 0 in V. vinifera andC. rotundifolia were used to identify the lineage-specific
expansion. Expanded orthogroups were identified with a p -value
less than 0.05 and the gene number greater than the average value of
multi-species. The tandem genes were identified by MCScanX (Wang et al.,
2012), which was consistent with the method described in our WGD
analysis. The gained TD genes of 18 species were obtained from the Count
results and were further categorized into either co-expanded or
lineage-specific expanded ones. The TD genes were GO termed by agriGO
database
(http://systemsbiology.cau.edu.cn/agriGOv2/index.php).
Further, four succulent plants were annotated both by GO database and
agriGO database.
Synteny analyses. All-vs-all BLASTP (Camacho et al., 2009)
(e-value 1e-05) and MCScanX (Wang et al., 2012) was used to predict the
collinear relationships and positional features between C.
rotundifolia and V. vinifera (PN40024). Blocks larger
than ten genes and gaps less than five genes were obtained. The synteny
map and dotplot were processed by MCScan and drawn by the python scripts
in MCScan packages (Tang et al., 2008).
The segment duplication events were predicted using self-vs-self BLASTP
(Camacho et al., 2009) (e-value 1e-05) and MCScanX among the C.
rotundifolia genome, requiring at least five genes per collinear block.
Subsequently, the pairwise sequences from the synteny blocks and segment
duplication pairs were processed by ParaAT (v2.0) (Zhang et al., 2012).
The values of nonsynonymous mutation rate (Ka ) and synonymous
mutation rate (Ks ) were calculated using the NG estimation method
in Kaks_Calculator (v2.0) (Wang et al., 2010). The visualization plots
of the Ks distribution were made using a custom R script.
Additionally, whole-genome duplication (WGD) events were determined by
the distribution of Ks of segment duplication pairs and
identified by comparisons with the events of V. vinifera(PN40024). The AEK was inferred from the genomes of eudicot species with
the smallest numbers of historical polyploidization events, including
grape, cacao, and peach. Further, the AEK was refined as a post-τ AMK
with ten protochromosomes and 13,916 ordered protogenes, a pre-τ AMK
with five protochromosomes and 6,707 ordered protogenes (Murat et al.,
2017). In the current study, the reconstruction of karyotype of theV. vinifera and C. rotundifolia were advised by a previous
study by Florent Murat (Murat et al., 2017), and the genes and gene
orders were used to construct the seven chromosomes and 21 chromosomes
of AEK. To cover as many genes as possible, we used version 2.1 of the
grape assembly, which anchored 32,424 coding genes (Table S1). MCScanX
(Wang et al., 2012), in a BLASTP and dotplot-based approach, was used to
detect the syntenic blocks between C. rotundifolia vs. AEK andV. vinifera vs. AEK with default parameters. The protein of the
pre-γAEK and post-γAEK compared to Cissus and grape by BLASTP.
The syntenic blocks were ordered according to the gene order of C.
rotundifolia and V. vinifera. Some small syntenic blocks and
small gaps were abandoned or closed to make the syntenic segments more
complete. On the base of dotplot illustrations of the synteny between
these two species, the karyotypic structures of the ancestral eudicots
were explained by taking into account the fewest number of genomic
rearrangements, which may have occurred between the AEK and modern
eudicot genomes (Figure S5, Tables S2 and S3).
Detection of significant expansion and contraction in succulent
plants. To investigate the significant expansion or contraction of gene
families, we divided 18 species into two categories, including five
succulent plants (S5) and 13 non-succulent/others plants (O13). Five
succulent plants, including Cissus rotundifolia , Ananas
comosus , Kalanchoe laxiflora , Hylocereus undatus ,Kalanchoe fedtschenkoi and other 13 plants were described in
Table S1. The average number of genes per orthogroup between two
categories was available to evaluate the significant events. For S5
plants, a binomial test with a probability of success of p(W) = 5/18 was
used. The criteria of significant expansion or contraction are as
follows: 1, A statistically false discovery rate-adjusted p -value
< 0.05 from the initial set of 97,344 orthogroups; 2, The
minimal contribution of about three for S5 and seven for O13 species; 3,
Contribute gene per orthogroup on average satisfied with (S5n/5) /
(O13n/13) > 1. We found that 88 orthogroups were expanded
(corresponding to 5,696 genes), and 178 were contracted in succulent
plants relative to non-succulent plants.
Measurement of titratable acidity. The diurnal changes of
titratable acid in leaves of C. rotundifolia were measured as
described by Chen et al. (1983). The samples were collected as mentioned
above. A total of 0.5 g leaves of each sample were cut into pieces,
placed in centrifugal tubes, and boiled for 30 min after adding 10 mL
CO2-free distilled water. The supernatant after
centrifugation was reserved. Additional 10 mL CO2-free
water was added to the pellet to extract and centrifuge again. Total
supernatants obtained by the two-stage extraction process were titrated
to pH 8.3 with 0.01 mol/L NaOH, and the acidity of the leaf was
represented as μeq of acid per gram fresh weight (μeq
g-1 FW).
RNA extraction and RNA-Seq library preparation. Total RNA was
extracted from the samples using the Universal Plant Total RNA Fast
Extraction Kit (BioTeke Corporation, Beijing, China). RNase-free DNase I
was used to remove DNA from the extracted RNA. The purity and
concentration of RNA were determined by a Nano Drop and Agilent 2100
bioanalyzer (Thermo Fisher Scientific, MA, USA). Subsequently, mRNA
enriched by Oligo (dT) - attached magnetic beads was randomly fragmented
into short pieces with an additional fragmentation buffer. Then,
first-strand cDNA was synthesized by random hexamer-primed reverse
transcription, followed by second-strand cDNA synthesis. A-Tailing Mix
and RNA Index Adapters were added by incubating to end repair. The
obtained cDNA fragments were amplified by PCR, and then products were
purified by Ampure XP Beads. Agilent Technologies 2100 bioanalyzer was
used for quality control of products. Finally, the cDNA library was
constructed, and the MGISEQ-2000 platform was used for paired-end
sequencing (2 x 150 bp). Approximately 40 million bp were generated for
each sample.
Transcriptome analysis. The quality of paired-end raw
transcriptome data was checked by FastQC v0.11.8 and trimmed using
Trimmomatic (v0.36) (Bolger et al., 2014). Then the trimmed reads were
mapped onto C. rotundifolia latest assembled genome through
TopHat (v2.1.1) (Trapnell et al., 2012). Using the gene model ofC. rotundifolia , the expression levels of genes represented by
FPKM (Fragments per Kilobase Million) for each sample were calculated by
Cufflinks (v2.2.1) (Trapnell et al., 2012) with default parameters. The
genes involved in the stomatal movement process and CAM pathway were
picked to show their expression patterns by ‘pheatmap’ package in R.
Identification of CAM pathway and stomatal movement
process-related genes. The genomes of Ananas comosus , V.
vinifera (PN40024), O. sativa , Zea mays , andPhalaenopsis equestris were downloaded from Pineapple Genomics
Database (PGD,
http://pineapple.angiosperms.org/pineapple/html/index.html) (Xu et
al., 2018), Phytozome database
(https://phytozome.jgi.doe.gov/pz/portal.html) (Goodstein et al.,
2012), Rice Genome Annotation Project
(http://rice.plantbiology.msu.edu/) (Kawahara et al., 2013),
MaizeGDB (https://maizegdb.org/) (Portwood et al., 2019), and NCBI
(https://www.ncbi.nlm.nih.gov/), respectively. Further, the CAM gene
list of Kalanchoë was obtained from a supplemental table of its
genome (Yang et al., 2017). The list of gene families, which includedcarbonic anhydrase (CA ), phosphoenolpyrvuate
carboxylase (PEPC ), phosphoenolpyruvate carboxylase
kinase (PEPCK ), malate dehydrogenase (MDH ),malic enzyme (ME ), phosphoenolpyruvate
carboxykinase (PPCK ), and pyruvate phosphate dikinase
regulatory protein (PPDKRP ), was obtained from PGD (Xu et al.,
2018). All given gene sequences in each family from pineapple, O.
sativa , and Z. mays , Phalaenopsis equestris andKalanchoë were used as queries to search corresponding family
members in C. rotundifolia and V. vinifera (PN40024) by
BLASTP. The genes with alignment length > 100bp and e-value
< 1e-05 were considered as potential members. Then online
software CD-search
(https://www.ncbi.nlm.nih.gov/cdd)
(Marchler-Bauer and Bryant, 2004) and PFAM
(https://pfam.xfam.org/)
(El-Gebali et al., 2019) were used to detect the specific domain. The
genes without a unique domain of gene family were abandoned. Then the
remaining genes were defined as candidate members and used for further
analyses. Diel expression dataset of Arabidopsis C3 leaf (Mockler
et al., 2007) and pineapple CAM leaf were used to compare with CAM-genes
shown in Figure 4b in C. rotundifolia , whose orthologs were
identified by BLASTP based on sequence similarity, and then, gene-pairs
between two species were used to calculate their relationship (Pearson
and Spearman) of transcript expression (Table S4). On the base of
satisfying two correlation coefficients (Pearson and Spearman), genes
(Rcr-at < 0.5) were determined as not
correlative expression patterns during a day/night cycle betweenCissus and Arabidopsis . Gene pairs (Rcr-at< 0.5 and Rcr-ac > 0.8) were
defined strongly CAM genes. The genes for stomatal movement were
identified using BLASTP with an Evalue cutoff of 1e-5 based on orthology
in Arabidopsis as described by Chen et al. (2020).
Co-expression network and cluster analysis . Transcripts with
average FPKM > 1 (calculated from three biological
replicates) in at least one of the nine samples were used to construct a
weighted gene co-expression network by R package WGCNA. The transcript
expression was log2 transformed. Modules were constructed using the
following parameters: power = 16, networkType = ”signe”,
mergeCutHeight = 0.18, corType = ”bicor”, minModuleSize = 30. All the
nine time point transcripts with three replicates were used to perform
cluster analysis by maSigPro package. The parameters were as following:
degree = 3, counts = F, MT.adjust = ”BH”. Transcripts were marked as
influential by the T.fit() function. Genes with “non-flat”
significantly changed across the nine time points. Nine clusters were
displayed using the ”see.genes” function with cluster.method=”hclust”,
k=9 in maSigPro. The network of each cluster was constructed by ARACNE
algorithm with ‘Discovery’ mode and ‘Naïve bayes’ mutual information
(MI) algorithm type in Cytoscape software. The p -value was
calculated based on MI, in which, less than 0.05 were selected in each
cluster. One percent of genes with at least ten edges in each network
were selected by cytoHubba, and CAM genes also were chosen based on a
minimum of ten directed edges.
Cis-element annotation and enrichment analysis of CAM
related genes. Promoter sequences in 2kb upstream of genes involved in
CAM were extracted from the C. rotundifolia genome. Of all the
promoter sequences, the cis -element enrichment of light,
circadian, temperature, and drought in CAM-related and stomatal
movement-related promoters were implemented by FIMO (Grant et al., 2011)
program with p -value < 0.0002 in MEME. Enrichment
analysis of about five known cis -elements including the morning
element (CCACAC), the evening element (AAAATATCT), the CCA1-binding site
(AAAAATCT), the TCP15 element (NGGNCCCAC), and the G-box element (G-box;
CACGTG) (Michael and McClung, 2002) were performed by FIMO (Grant et
al., 2011) program.