Genome Annotation
The draft genome was annotated using de novo predictions, homology-based
predictions, and transcriptome data from RNA-seq of leaves, stems, and
roots. In total, 61,716 genes were annotated, with 88.03% of them
(54,326 genes) supported by transcriptome data (Table S9). The average
gene length was 3,903 bp, with 5.57 exons per gene (Table S9). We
submitted all gene models to five protein databases for annotation: NR
(56,281, 92.74%), SwissProt (43,566, 71.79%), GO (24,080, 39.68%),
KEGG (54,939, 90.53%), and Pfam (40,976, 67.52%). At least one
database funtionally annotated 98.33% (60,687 genes) of the genes
(Figure S4, Table S8). We named the genes according to the nomenclature
used for Arabidopsis (Arabidopsis Genome Initiative, 2000)
to indicate the relative positions of genes on the pseudochromosomes.
The draft rosemary genome contained 68.46% repetitive sequences, with
67.26% of the genome consisting interspersed repeats. Long terminal
repeat (LTR) retroelements comprised 34.16% of the genome, with 24.28%
LTR/Gypsy and 9.54% Copia elements being the predominant
elements. DNA transposons accounted for 3.01% of the rosemary genome
(Table S7). We detected noncoding RNAs (ncRNAs) using tRNAscan-SE and
RNAmmer, which generated 413 microRNAs (miRNAs), 1,629 transfer RNAs
(tRNAs), and 362 ribosomal RNAs (rRNAs) (Table S7). Figure 1 provides an
overview of the genes, repeats, non-coding RNA densities, and all
detected segmental duplications.