Gene prediction and functional annotation
BLASTP (E-value cutoff 1e-05) (https://blast.ncbi.nlm.nih.gov/Blast.cgi) comparison searches were first performed between the predicted protein sequences of genes and the entries in the public protein sequence database, including NR and Swiss-Prot, to obtain functional annotations. InterProScan (Jones et al., 2014) was then employed to compared protein domains and functional site databases to further identify protein function. Gene Ontology (GO) terms were derived from corresponding InterPro or Pfam entries. Pathways reconstruction was carried out using KOBAS and Kyoto Encyclopedia of Genes and Genome (KEGG) databases (http://www.genome.jp/kegg/).
Repeat sequences were annotated using a combined strategy. We first used LTR_FINDER (http://tlife.fudan.edu.cn/ltr_finder/) to search RepBase database with default settings, and then constructed a de novo library using RepeatModeler (http://www.repeatmasker.org/RepeatModeler/html/). Known repeat sequences were identified using RepeatMasker (http://www.repeatmasker.org) based on the Repbase-derived RepeatMasker library and the de novo library. We predicted rRNAs using RNAmmer (http://www.cbs.dtu.dk/services/RNAmmer/) and annotated ncRNAs and sRNAs with tRNAscan-SE (http://lowelab.ucsc.edu/tRNAscan-SE/). In addition, we identified other types of RNAs, including miRNAs and snRNAs, by searching the Rfam database with INFERNAL (http://infernal.janelia.org/).