Genome assembly, annotation, and repetitive sequences
characterization
We assembled a highly heterozygous (1.19 %) genome of C.
rotundifolia , by combining the 39.38 gigabases (Gb) of PacBio Sequel
sequences (~ 106 ×) and 28.31 Gb of Illumina paired-end
reads (~ 77 ×) (Figure S1, Table S5). We arranged 3,289
contigs (contig N50 = 186 Kb) based on the spatial relationship deduced
from 130.44 Gb of Hi-C assay data (~ 362 ×) (Table S6).
A total length of 350.69 Mb scaffolds was ordered and anchored onto 12
pseudo-chromosomes with scaffold N50 up to 27.6 Mb, covering 94.53 % of
the assembled genome (Figure 1c, Figure S1, Table S7). We identified
169,723 homozygous mutation bases representing 0.045 % of assembled
genomes (one error per 2.22 Kb).
A total of 30,824 protein-coding genes were predicted by using a
combination of ab initio, transcript evidence, and homology-based
methods. We used Swissport, NCBI, GO, KEGG, and eggNOG databases to
annotate approximately 82.15 % of the coding genes (Table S8).
Moreover, Benchmarking Universal Single-Copy Orthologs analysis
suggested that 92.4 % of the genes could be recovered (Table S9). In
addition, we identified 692 transfer RNAs, 128 microRNAs, 232 ribosomal
RNAs (18S, 28S, 5.8S, and 5S), and 971 small nucleolar RNAs (Figure S2).
Repetitive sequences dominated 47.41 % of the genome, of which 31.07 %
were long terminal repeat (LTR) elements (Table S10). Estimates of
sequence divergence times between the adjacent 5′ and 3′ LTRs of the
same retrotransposon suggested a very recent burst of activity in less
than 90.77 thousand years ago (kya) and much severe invasion than in
grape (Figure 1d, Table S10). Further, we found 584,679 (12.90 Mb)
simple sequence repeats (SSRs) with six as the most abundance unit size,
slightly less than that in V. vinifera (PN40024, 930,680, 23.05
Mb) (Table S11).