Genome annotation and gene prediction
A total of ~327 Mb (58.81%) of theO. kokonorica genome assembly was identified as repetitive elements. The vast majority of repeats were classified as LTR retrotransposons, accounting for 36.02% of the genome, including approximately 28.71% Gypsy and 7.30% Copia retrotransposons (Table S5), a higher percentage than that in related species C. songorica (26%). Analysis of dynamic evolution of LTRs indicated that LTRs of O. kokonoricawere younger than those in C. songorica , which experienced a more recent expansion with a peak of 0.8 Ma (Figure S3). DNA transposons, long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs) accounted for 8.92%, 3.56%, and 0.22%, respectively, of the genome assembly (Table S5).
In total, a high-confidence set of 48,598 protein-coding genes was predicted using a combination of de novo , homology-based, and transcriptome-based approaches, and 48,521 (99.84%) were anchored into 20 pseudochromosomes (Table 1; Table S6). With similar features of other Gramineae species, protein-coding genes in O. kokonorica were 4,327 bp long and covered 6.18 exons on average. The lengths of exons and introns were highly conserved in all five investigated plant genomes (Figure S4; Table S7), which further illustrated the reliability of the annotation results. In addition, BUSCO analysis of the protein set showed that the annotated genome contained 90.60% BUSCOs (Table S8), suggesting good annotation completeness of protein-coding genes. Approximately 90.59% of O. kokonorica genes could be annotated by non-redundant nucleotides and proteins in the SWISS-PROT Protein Sequence Database, Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), Clusters of orthologous groups for eukaryotic complete genomes (KOG) and Non-Redundant Protein Sequence Database (NR) (Table S9). We also identified 231 microRNA (miRNA), 1,012 small nuclear (snRNA) genes, 903 transfer RNA (tRNA), 183 ribosomal RNA (rRNA) in the genome sequence (Table S10). Characterization and features of theO. kokonorica genome are exhibited in Figure 1B. The LTRs exhibited an inverse correlation with the gene density, and these transposable elements were mainly distributed across the pericentric regions, while genes were mainly enriched in the more distal chromosomal regions.