Genome sequencing and assembly
DNA for sequencing the genome of S. rosmarinus was extracted from a single plant maintained at Zhejiang Sci-Tech University, and sequencing was performed using both Illumina and PacBio technologies. The initial genome assembly was generated using 107 Gb (84.92x) Illumina reads and 60.5 Gb (48.01x) PacBio reads, resulting in a genome size of 1263.45 Mb estimated through k-mer analysis (Figure S1, Tables S1, S2). After interactive error correction of PacBio reads and assembly of primary contigs using Canu (v2.1.1) and Falcon (v0.0.3) respectively, the genome was phased and polished using Pilon. After assembly of the PacBio long reads and error correction with Illumina short reads, the final predicted genome size was 1239.46 Mb, with a scaffold N50 of 109.261 Kb (Table S3). The genome was further refined using an 88.10x coverage Hi-C library (111Gb), resulting in 19,878 scaffolds that were placed in 12 pseudochromosomes (ranging from 71.21 Mb to 144.54 Mb), and contained approximately 94.08% of the assembled sequences (Tables S1, S4). The final size of rosemary assembly reached 1,24 Gb, with scaffold N50 of 107.45 Mb. To assess the completeness of the assembly, RNA-seq reads from three different tissues were mapped to the genome, resulting in mapping rate of between 67.95% and 92.40% (Table S14). Additionally, BUSCO analysis (Simão et al., 2015) (Benchmarking Universal Single-copy Orthologs) showed a high level of completeness of both the genome assembly (96.8%) and annotation (88.5%) (Table S5 and S6), supporting the high quality of the S. rosmarinus genome assembly (Figure 1).