Genome assembly and quality assessment
To estimate the genome size of Salvia rosmarinus , the Illumina genomic reads were used as the input of the Jellyfish (v1.1.10) tool to obtain the k-mer frequency. Genome size was then predicted to be about 1.2 Gb using GenomeScope (v2.0) (Vurture et al., 2017), with a k-mer length of 31.
Due to the high heterozygosity of 1.7% from GenomeScope results, genome assembly was conducted by combining accurate short-reads with long reads to enhance the assembly performance. The PacBio reads were assembled using Canu (v2.0) (Koren et al., 2017) and FALCON-Unzip (v0.4.0) (Chin et al., 2016), which generated the best primary contigs. We derived a reference genome assembly by selecting the best assembly using Canu, and a phased genome assembly by selecting the best assembly using FALCON-Unzip. Primary contigs were then minced in the form of haplotigs pair, and haplotypes were collapsed. The phased genome assembly and Hi-C libraries were provided to Falcon-Phase (Kronenberg et al., 2021) to obtain a normalized contact matrix, which was used to phase the genome into haplotigs. To extend the genome from contig level to scaffold level, the Canu assembly results were used as the reference genome and Hi-C data were compared to the reference genome by BWA, which was set to strict mode (-n 0) in order to improve the linkage quality, and read pairs were spliced to scaffolds when compared to different contigs. Ultimately, we obtained a phased chromosome-level genome assembly ofS. rosmarinus .
To assess the quality of the assembly, short reads were mapped to theS. rosmarinus genome assembly using BWA software (v0.7.12) (H. Li & Durbin, 2010), with low-quality reads were filtered out (Phreads < 30). Annotation of S. rosmarinus and S. baicalensis were added to Allele table with BLASTN identity < 60% and coverage < 80% in order to filter out noisy signals. All contigs were assigned to 12 pseudochromosomes by partitioning and rescuing. After ordering and format conversion, the rosemary genome was finally assembled. The Benchmarking Universal Single-Copy Orthologs (BUSCOs) (v5.1.2) (Simão, Waterhouse, Ioannidis, Kriventseva, & Zdobnov, 2015) pipeline was utilized to conduct an independent assessment of the assembly quality.