Genome Annotation
The draft genome was annotated using de novo predictions, homology-based predictions, and transcriptome data from RNA-seq of leaves, stems, and roots. In total, 61,716 genes were annotated, with 88.03% of them (54,326 genes) supported by transcriptome data (Table S9). The average gene length was 3,903 bp, with 5.57 exons per gene (Table S9). We submitted all gene models to five protein databases for annotation: NR (56,281, 92.74%), SwissProt (43,566, 71.79%), GO (24,080, 39.68%), KEGG (54,939, 90.53%), and Pfam (40,976, 67.52%). At least one database funtionally annotated 98.33% (60,687 genes) of the genes (Figure S4, Table S8). We named the genes according to the nomenclature used for Arabidopsis (Arabidopsis Genome Initiative, 2000) to indicate the relative positions of genes on the pseudochromosomes.
The draft rosemary genome contained 68.46% repetitive sequences, with 67.26% of the genome consisting interspersed repeats. Long terminal repeat (LTR) retroelements comprised 34.16% of the genome, with 24.28% LTR/Gypsy and 9.54% Copia elements being the predominant elements. DNA transposons accounted for 3.01% of the rosemary genome (Table S7). We detected noncoding RNAs (ncRNAs) using tRNAscan-SE and RNAmmer, which generated 413 microRNAs (miRNAs), 1,629 transfer RNAs (tRNAs), and 362 ribosomal RNAs (rRNAs) (Table S7). Figure 1 provides an overview of the genes, repeats, non-coding RNA densities, and all detected segmental duplications.