Genome assembly and quality assessment
To estimate the genome size of Salvia rosmarinus , the Illumina
genomic reads were used as the input of the Jellyfish (v1.1.10) tool to
obtain the k-mer frequency. Genome size was then predicted to be about
1.2 Gb using GenomeScope (v2.0) (Vurture et al., 2017), with a k-mer
length of 31.
Due to the high heterozygosity of 1.7% from GenomeScope results, genome
assembly was conducted by combining accurate short-reads with long reads
to enhance the assembly performance. The PacBio reads were assembled
using Canu (v2.0) (Koren et al., 2017) and FALCON-Unzip (v0.4.0) (Chin
et al., 2016), which generated the best primary contigs. We derived a
reference genome assembly by selecting the best assembly using Canu, and
a phased genome assembly by selecting the best assembly using
FALCON-Unzip. Primary contigs were then minced in the form of haplotigs
pair, and haplotypes were collapsed. The phased genome assembly and Hi-C
libraries were provided to Falcon-Phase (Kronenberg et al., 2021) to
obtain a normalized contact matrix, which was used to phase the genome
into haplotigs. To extend the genome from contig level to scaffold
level, the Canu assembly results were used as the reference genome and
Hi-C data were compared to the reference genome by BWA, which was set to
strict mode (-n 0) in order to improve the linkage quality, and read
pairs were spliced to scaffolds when compared to different contigs.
Ultimately, we obtained a phased chromosome-level genome assembly ofS. rosmarinus .
To assess the quality of the assembly, short reads were mapped to theS. rosmarinus genome assembly using BWA software (v0.7.12) (H. Li
& Durbin, 2010), with low-quality reads were filtered out (Phreads
< 30). Annotation of S. rosmarinus and S.
baicalensis were added to Allele table with BLASTN identity <
60% and coverage < 80% in order to filter out noisy signals.
All contigs were assigned to 12 pseudochromosomes by partitioning and
rescuing. After ordering and format conversion, the rosemary genome was
finally assembled. The Benchmarking Universal Single-Copy Orthologs
(BUSCOs) (v5.1.2) (Simão, Waterhouse, Ioannidis, Kriventseva, &
Zdobnov, 2015) pipeline was utilized to conduct an independent
assessment of the assembly quality.