Genome sequencing and assembly
DNA for sequencing the genome of S. rosmarinus was extracted from
a single plant maintained at Zhejiang Sci-Tech University, and
sequencing was performed using both Illumina and PacBio technologies.
The initial genome assembly was generated using 107 Gb (84.92x) Illumina
reads and 60.5 Gb (48.01x) PacBio reads, resulting in a genome size of
1263.45 Mb estimated through k-mer analysis (Figure S1, Tables S1, S2).
After interactive error correction of PacBio reads and assembly of
primary contigs using Canu (v2.1.1) and Falcon (v0.0.3) respectively,
the genome was phased and polished using Pilon. After assembly of the
PacBio long reads and error correction with Illumina short reads, the
final predicted genome size was 1239.46 Mb, with a scaffold N50 of
109.261 Kb (Table S3). The genome was further refined using an 88.10x
coverage Hi-C library (111Gb), resulting in 19,878 scaffolds that were
placed in 12 pseudochromosomes (ranging from 71.21 Mb to 144.54 Mb), and
contained approximately 94.08% of the assembled sequences (Tables S1,
S4). The final size of rosemary assembly reached 1,24 Gb, with scaffold
N50 of 107.45 Mb. To assess the completeness of the assembly, RNA-seq
reads from three different tissues were mapped to the genome, resulting
in mapping rate of between 67.95% and 92.40% (Table S14).
Additionally, BUSCO analysis (Simão et al., 2015) (Benchmarking
Universal Single-copy Orthologs) showed a high level of completeness of
both the genome assembly (96.8%) and annotation (88.5%) (Table S5 and
S6), supporting the high quality of the S. rosmarinus genome
assembly (Figure 1).