Gene prediction and functional annotation
The repeat masked genome was used for predicting subsequent
protein-coding genes with a combination of three complementary methods:de novo , homology-based, and transcriptome-based prediction.
Augustus v. 3.3.257 (Stanke et al., 2004), GlimmerHMM v. 3.0.458
(Majoros et al., 2004) and Genscan (Burge & Karlin, 1997) were used forde novo predictions. GeMoMa v.1.3.161 (Keilwagen et al., 2019)
was used for homology-based predictions, with protein sequences fromArabidopsis thaliana, Eragrostis curvula, Eragrostis tef, O.
thomaeum, Oryza sativa, Prunus persica, Sorghum bicolor, Triticum
aestivum, Zea mays . For transcriptome-based predictions, we first
sequenced the RNA libraries generated from five tissues (i.e.,root, rhizome, rhizome tip, young leaf and mature leaf) and assembled
the RNA-seq reads into transcripts using Trinity v. 2.1.162 (Grabherr et
al., 2011) with default parameters. We also used all the RNA-seq reads
to assess genome assembly quality by mapping to the final assembled
genome using PASA v. 2.1.063 (Haas et al., 2003). Finally, all
predictions of gene models yielded by the above approaches were
integrated using EVidenceModeler (EVM) v. 1.1.1 (Haas et al., 2008) to
generate a consensus gene set.
The predicted protein-coding genes were functionally annotated by
searching against databases. We used Interproscan v. 5.36 (Apweiler et
al., 2000), including Gene Ontology (GO) database annotations, protein
motifs and domains, functional classifications, protein family
identification, transmembrane topology, and predicted signal peptides,
to obtain a comprehensive annotation of the predicted protein-coding
genes. We used a custom Perl script to get the annotation information.
Then, KOBAS (http://kobas.cbi.pku.edu.cn/annotate/) was used to search
the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (Kanehisa &
Goto, 2000) for orthologs. Finally, we used BLASTP to search against the
Swiss-Prot (Bairoch & Apweiler, 2000), NR (Pruitt et al., 2007), and
KOG databases with an e-value cutoff of 1e-5. All of the best hits of
these database searches were integrated to obtain the final functional
annotation result.