AsgeneDB: A curated orthology arsenic metabolism gene database and
computational tool for metagenome annotation
Abstract
Arsenic (As) is the most ubiquitous toxic metalloid in nature. Microbe
mediated As metabolism plays an important role in the global As
biogeochemical processes, greatly changing its toxicity and
bioavailability. While metagenomic sequencing may advance our
understanding of the As metabolism capacity of microbial communities in
different environments, accurate metagenomic profiling of As metabolism
remains challenging due to low coverage and inaccurate definitions of As
metabolism gene families in public orthology databases. Here we
developed a manually curated As metabolism gene database (AsgeneDB)
comprising 414,773 representative sequences from 59 As metabolism gene
families, which are affiliated with 1,653 microbial genera from 46
phyla. We then applied AsgeneDB for functional and taxonomic profiling
of As metabolism in metagenomes from various habitats (freshwater, hot
spring, marine sediment, and soil). Compared with other databases,
AsgeneDB substantially improved the mapping ratio of short read in
metagenomes from various environments. Our results indicate that the
diversity and importance of microbial arsenic metabolism in the
environment remains to be explored. In addition, we developed an R
package Asgene to facilitate the analysis and statistical of metagenomic
data. AsgeneDB and the associated R Package Asgene will greatly promote
the study of arsenic metabolism in microbial communities in various
environments.