2.5 | Protein-coding genes prediction and function
annotation
A combined strategy of de novo gene prediction, homology-based search
and RNA sequencing-aided annotation were used to perform gene
prediction. For homology-based annotation, we selected the
protein-coding sequences of five homologous species (Brugia
malayi , C. elegans , Pristionchus pacificus ,Steinernema carpocapsae and T. canis ) from NCBI
(https://www.ncbi.nlm.nih.gov/). For RNA-based prediction, a male and a
female transcriptome sequence was aligned to the genome for assembly
using TopHat (v2.1.0)(Trapnell, Pachter, & Salzberg) plus Trinity
(v2.0.6)(Haas et al.) strategy. PASApipeline (v.2.1.0) was applied to
predict gene structure after which the inferred gene structures were
used in AUGUSTUS (v.3.2.3)(Mario et al., 2006) to train gene models
based on transcript evidence. In addition, genome sequence was analyzed
by the program GeneMark (v1.0)(John & Mark, 2005) utilizing
unsupervised training to build a hidden Markov model. The consistent
gene sets were generated by combining all above evidence using MAKER
(v.2.31.8)(Campbell, Law, Holt, Stein, & Yandell, 2013). All gene
evidence was merged to form a comprehensive and non-redundant gene set
using EvidenceModeler (v1.1.1, EVM)(Haas et al., 2008).
In order to perform gene functional annotation, we aligned above gene
sets against several known databases, including SwissProt, TrEMBL, KEGG,
COG andNR. GO information was obtained through Blast2go (v.2.5.0)(Conesa
et al., 2005). Furthermore, the mitochondrial genome was assembled by
blasting with B. schroederi ’s mtDNA sequence from NCBI
database(NC_015927.1)(Xie et al., 2011). The mitochondrial genome was
annotated on GeSeq online
(https://chlorobox.mpimp-golm.mpg.de/geseq.html)
using homologous gene alignment(Michael et al., 2017). Four types of
Non-coding RNA (ncRNA; including tRNA, snRNA, miRNA, and rRNA) were
predicted. tRNAscan-SE (v1.3.1)(Lowe & Eddy, 1997) were used to predict
tRNAs. We aligned B. schroederi genome against Rfam
(v12.0)(Kalvari et al., 2018) database and invertebrate rRNA database to
predict snRNA, miRNA and rRNA, respectively.