Database sources
We
used the UniProt database to retrieve seed sequences and construct the
core database (The UniProt Consortium, 2017).
The
orthology databases used for database merging and homologous gene
identification in this study included arCOG (Makarova et al., 2015), COG
(Galperin et al., 2021), eggNOG (Huerta-Cepas et al., 2019) and KEGG
(Kanehisa et al., 2016).
The
microbial NCBI RefSeq database (O’Leary et al., 2016) was used to enrich
AsgeneDB
and
for taxonomically classifying microbial communities of As metabolism.
Metagenomic
profiling of Asmetabolic
genes
To
facilitate user
operation,
an
R Package (Asgene ) is provided for
metagenomic
alignment
(nucleic
acid or protein sequence), subsequent gene family abundance statistics,
and sample abundance standardization. The Asgene Package is
available on github
(https://github.com/XinweiSong/Asgene).
Users only need to choose a database search tool according to their
needs
(e.g.,
USEARCH, BLAST or DIAMOND) and input several parameters
(e.g.,
working path, search parameters of tool and filetype) to automatically
analyze statistics and output statistical results. Users can select gene
abundance statistics (Option: abundance) to normalize read counts per
kilobase per million reads (RPKM) to eliminate differences in sequencing
depth and reference sequence length between samples. In addition, if the
user selects functional species statistics (Option: taxonomy), the
statistical results of the driving species of each As metabolism gene at
different classification levels in the sample can be generated
automatically (Figure
1c).
Our
work can be used to analyze metagenomic data, providing functional
profiles at the gene family level and composition of functional
microbial community at various classification levels in different
environments.