Introduction
Arsenic (As) is classified as a group I carcinogen by the International
Agency for Research on Cancer, known as both “the king of poisons” and
“the poison of kings” (Zheng, 2020). As has therefore been a prime
focus of ecology and environmental sciences (S.-Y. Zhang et al., 2017;
Zheng, 2020). Once elemental As is released from mineral deposits by
geological, agricultural, and industrial processes, the element’s
toxicity and mobility can be greatly altered by microbial metabolism
(Achour, Bauda, & Billard, 2007; Oremland & Stolz, 2003).
These
metabolic processes play a major role in the global As cycle through
microbial oxidation, respiration, reduction, and methylation
(Mukhopadhyay, Rosen, Phung, & Silver, 2002) and are mediated by a
variety of
genes.
It has been reported that almost all microorganisms have As resistance
and metabolism genes (Zhu, Xue, Kappler, Rosen, & Meharg, 2017). For
example, As redox genes
encoding
cytoplasmic
arsenate [As(V)] reductase
(arsC ),
periplasmic
As(V) respiratory reductase (arrAB ) and arsenite [As(III)]
oxidase (aioAB/arxA ) affect species transformation between As(V)
and As(III) [7–9] while As(III) S-adenosine methionine
methyltransferase (arsM) and nonheme iron-dependent dioxygenase
(arsI) with C-As lyase activity catalyze As methylation and
demethylation (Jia et al., 2013a; Yoshinaga & Rosen, 2014).
Mechanisms
involved in As metabolism can also be coopted from other processes with
As(III) and As(V) acting as analogues of glycerol and phosphate,
allowing microbial uptake through glycerol transporters (GlpF )
and phosphate transporters (Pit/Pst ) (Borgnia, Nielsen, Engel, &
Agre, 1999; Wysocki et al., 2001).
As
these processes greatly change the toxicity and bioavailability of
As,
the study of microbial As metabolism genes is of great importance for
understanding the process of environmental As metabolism and microbial
remediation potential.
Although
the mechanisms of microbial As metabolism are well documented and
characterized, the distribution and diversity of As metabolic genes in
microbial communities is still unclear due to the large proportion of
uncultured microorganisms in environmental samples. Previous works
investigating the distribution and diversity of several genes have
typically used targeted primer sets to conduct analyses such as
polymerase chain reaction (PCR), cloning, denaturing gradient gel
electrophoresis (DGGE), microarray-based metagenomic techniques (e.g.
GeoChip) and quantitative PCR (qPCR)
(Achour
et al., 2007; Cai, Liu, Rensing, & Wang, 2009; H.-T. Wang et al., 2019;
C. Zhang et al., 2021). These methods are limited by
their
low throughput that only targets one or several specific genes and also
by nonspecific amplification introduced by the primers. In addition, as
primers cannot be designed for unknown nucleic acid sequences, the
inability to detect unknown microorganisms is the biggest obstacle to
this kind of technology. Characterization of microbial-induced As
metabolism at gene
and
species level
resolution
has become an important method to better understand microbial As
metabolism in the current metagenomic era. In contrast, high-throughput
sequencing techniques target all genes and do not rely on the
specificity and coverage of primers. Shotgun metagenomic sequencing
technology can probe the function of unknown microbiome and enable us to
have
a detailed understanding of As metabolism in a complex microbiome, so
that microbiome metabolism can be used to address environmental issues
(Xiao et al., 2016; S.-Y. Zhang et al.,
2017).
However, metagenomic data analysis requires comprehensive and reliable
orthology databases for accurate metagenomic profiling of functional
gene families. An undesired observation is that the results of
metagenomic analysis are substantially affected by orthology database
(Nayfach & Pollard, 2016).
Orthology
databases such as arCOG (Archaeal Clusters of Orthologous Genes)
(Nayfach & Pollard, 2016), COG (Clusters of Orthologous Groups)
(Galperin et al., 2021), eggNOG (Huerta-Cepas et al., 2019) and KEGG
(Kanehisa, Sato, Kawashima, Furumichi, & Tanabe, 2016) have been
developed to date and are widely used for functional annotation in both
genomic and metagenomic studies. These databases have their own distinct
features due to differences in the design concept, with arCOG for
archaeal
annotation
(Makarova, Wolf, & Koonin, 2015), COG and eggNOG for annotation of
orthologous groups (Galperin et al., 2021; Huerta-Cepas et al., 2019)
and KEGG for linking genes with pathways (Kanehisa et al., 2016).
When
As metabolism is considered, analytical limitations encountered in these
databases include low coverage of As metabolic genes, difficulty in
distinguishing homologous genes, and long database search times (Tu,
Lin, Cheng, Deng, & He, 2019; Yu et al., 2021). Therefore, the
development of a comprehensive and accurate database of As metabolism
genes is essential for efficient analysis of As metabolism function in
microbial communities.
In
this study,
to
understand the
microbial
community of As metabolism in the environment, we developed a manually
curated As metabolism gene database
(AsgeneDB),
which
covers
five As metabolic pathways
(transport,
respiratory, reduction, oxidative and
methylation/demethylation
processes), 59 As metabolism gene families and 414,773 representative
sequences.
AsgeneDB integrates multiple lineal homology databases, including 46
phyla and 1,653 genera of bacteria, archaea and
fungi.
AsgeneDB
enables researchers to directly study newly discovered arsenic metabolic
pathways and gene families, allowing high specificity,
comprehensiveness, representativeness, and accuracy. To facilitate
metagenomic data comparison and statistics, we developed an R packageAsgene that can be used to automatically provide statistical
results of gene family abundance
and
functional community composition at different classification levels in
different environments. AsgeneDB was compared with
five
other orthology databases by analyzing metagenomic sequencing data from
four habitats (freshwater, hot spring, marine sediment and
soil).
Our
results show that
AsgeneDB
could detect more As metabolism genes and abundance in environmental
microorganisms. In addition, there were significant differences in the
abundance of As
functional
genes and functional driving species in microbial communities among
different habitats. As transport genes are more abundant in the
microbial communities than As(V) respiration, methylation, and
demethylation genes.
These
results demonstrate the vast diversity and importance of microbial As
metabolism functions in the environment that remain to be explored.
Therefore,
AsgeneDB and Asgene Package will become a convenient tool for
comprehensive and accurate metagenomic analysis of arsenic metabolism,
greatly promoting research in this area.