Gene prediction and functional annotation
BLASTP (E-value cutoff 1e-05)
(https://blast.ncbi.nlm.nih.gov/Blast.cgi)
comparison searches were first performed between the predicted protein
sequences of genes and the entries in the public protein sequence
database, including NR and Swiss-Prot, to obtain functional annotations.
InterProScan (Jones et al., 2014) was then employed to compared protein
domains and functional site databases to further identify protein
function. Gene Ontology (GO) terms were derived from corresponding
InterPro or Pfam entries. Pathways reconstruction was carried out using
KOBAS and Kyoto Encyclopedia of Genes and Genome (KEGG) databases
(http://www.genome.jp/kegg/).
Repeat sequences were annotated using a combined strategy. We first used
LTR_FINDER (http://tlife.fudan.edu.cn/ltr_finder/) to search
RepBase database with default settings, and then constructed a de
novo library using RepeatModeler
(http://www.repeatmasker.org/RepeatModeler/html/). Known repeat
sequences were identified using RepeatMasker
(http://www.repeatmasker.org) based
on the Repbase-derived RepeatMasker library and the de novo library. We
predicted rRNAs using RNAmmer
(http://www.cbs.dtu.dk/services/RNAmmer/) and annotated ncRNAs and
sRNAs with tRNAscan-SE (http://lowelab.ucsc.edu/tRNAscan-SE/). In
addition, we identified other types of RNAs, including miRNAs and
snRNAs, by searching the Rfam database with INFERNAL
(http://infernal.janelia.org/).