As(V) Reduction Pathway
Gene families such as arsC , acr2 and GstB are
included for this pathway with 100,357 sequences and 84 homologous
orthology groups (Figure 3; Table S5).
Nearly
every extant microbe has
ArsB
or Acr3 efflux
permeases
for As(III) detoxification (Zhu et al., 2017). When As(V) became the
predominant soluble species, all cells had to do was to reduce As(V) to
As(III), the substrate of ArsB or Acr3, and they would become resistant
to As(V) (Mukhopadhyay & Rosen, 2002). However, ArsC, Acr2, GstB, etc.
located in the cytoplasm can reduce As(V) in the cytoplasmic membrane
and then excrete As(III) through the ArsB or Acr3 efflux pum
(Bhattacharjee, Sheng, Ajees, Mukhopadhyay, & Rosen, 2010;
Chrysostomou, Quandt, Marshall, Stone, & Georgiou, 2015). The
transcriptional repressor (ArsR) controls these ars-operons (J. Chen,
Nadar, & Rosen, 2017; Qin et al., 2007).
As( III)
Oxidation
Pathway
There
are 15 gene families responsible for As(III) oxidation, with a total of
92,183 sequences and 39 homologous orthology groups (Figure 3; Table
S5). As(III) oxidizing microorganisms exist widely in nature and include
both heterotrophic and chemo/photosynthetic autotrophic
microorganisms
(Hamamura et al., 2009). During early life, As(III) oxidation by
anaerobes would have produced As(V) in the absence of an
oxygen-containing atmosphere, which opened a niche for As(V)-respiring
microbes prior to the
Great
Oxidation Event (GOE) (Kulp, 2014).
As(III)
oxidation is catalyzed by the enzyme As(III) oxidase. This enzyme is
composed of two subunits, a large subunit (α) having molybdopterin and a
[3Fe-4S] cluster (AioA) and a smaller subunit (β) incorporating a
Rieske-type [2Fe-2S] cluster (AioB) (Hamamura et al., 2009). BothaioS /aroS /aoxS (sensor histidine kinase) andaioR /aroR/aoxR (transcriptional regulator) can regulate
expression of aio genes via recognizing As(III) (Sardiwal, Santini,
Osborne, & Djordjevic, 2010). The operon sometimes has aaioX /arxX gene that
encodes
an As(III)-binding protein involved in As(III)-based signaling and
regulation of As(III) oxidation, or
amoeA gene encoding MoeA protein
that
synthesizes the molybdenum cofactor of AioAB oxidase (Sardiwal et al.,
2010).
A
new type of As(III) oxidase (arxA ) has been discovered with both
As(V) reductase and As(III) oxidase activities in vitro (Zargar,
Hoeft, Oremland, & Saltikov, 2010).
In
addition to arxA , arxB , arxC , arxD andarxH encode
for
As(III) oxidation coupled to photosynthesis (Zargar et al., 2012).
An
adjacent and divergent gene cluster, arxXSR , encodes putative
regulatory proteins, a periplasmic substrate-binding protein specific
for phosphate (ArxX), a two-component histidine kinase sensor (ArxS),
and a response regulator (ArxR) (Zargar et al.,
2012).
In addition, methylarsenite-specific oxidase
ArsH
can oxidize methylarsenite to methylarsenate (J. Chen, Bhattacharjee, &
Rosen, 2015; Qin et al., 2006).
As
(De)Methylation Pathway
Three
gene families, including arsM , As3mt and arsI are
involved in As
methylation
and demethylation pathways with 7,862 sequences and 24 homologous
orthology groups (Figure 3; Table S5).
More
recent reports of methylated As show that As methylation is widespread
in the environment (J. Chen, Bhattacharjee, et al., 2015; P. Wang, Sun,
Jia, Meharg, & Zhu, 2014; C. Zhang et al., 2021).
Methylation
is catalyzed by the enzyme
As(III)
S-adenosylmethionine (SAM) methyltransferase (ArsM), designated as AS3MT
in animals and as
ArsM
in microorganisms.
The
gene arsI ,
which
catalyzes demethylation of organic As(III), was identified and
characterized from the environmental isolate bacterium Bacillussp. MD1 (Yoshinaga & Rosen, 2014) and from the cyanobacteriumNostoc sp. 7120 (Yan, Ye, Xue, & Zhu,
2015).
ArsI,
a
nonheme
iron-dependent dioxygenase with C–As lyase activity, cleaves the C–As
bond in MAs(III), trivalent roxarsone, and other trivalent aromatic
Asals (Yoshinaga, Cai, & Rosen, 2011). Putative ArsI orthologs were
found only in bacterial species, suggesting that alternate pathways
of
organoarsenical
demethylation might exist in other organisms (Yoshinaga & Rosen, 2014;
Zhu et al., 2017).
Taxonomic
composition of As metabolic genes and pathways in AsgeneDB
To
understand the taxonomic composition of As metabolism genes and pathways
in AsgeneDB, we mapped sequences targeting As metabolism genes and
pathways to reference genomes from NCBI RefSeq. The results indicate
that
AsgeneDB
covers 46 phyla and 1,653 genera of bacteria, archaea and
fungi
(Table
S1).
In the As transport pathway,
AsgeneDB
covered 33 phyla and 1,141 genera of bacteria, among which the
dominant
phyla were Proteobacteria , Actinobacteria ,Firmicutes and Bacteroidetes ( Table S6).Euryarchaeotawas the
dominant
phyla in 6 phyla of archaea. The predominant Eukaryotaes wereSordariomycetes , Eurotiomycetes and Saccharomycetesin Ascomycota and Ustilaginomycetes inBasidiomycota . In addition, Halobacteria ofEuryarchaeota , Betaproteobacteria ,Deltaproteobacteria and Gammaproteobacteria class ofProteobacteria , Clostridia in Firmicutes andDeferribacteres in Deferribacteres drove the As(V)
respiratory pathway. For the As(V) reduction pathway, AsgeneDB covered
34 bacterial
phyla,
mainly Proteobacteria , Actinobacteria , Firmicutesand Bacteroidetes .
It
covers 6 archaea, mainly Euryarchaeota , CandidatusThermoplasmatota and Thaumarchaeota .Saccharomycetes and Eurotiomycetes of Ascomycotawere the dominant
Eukaryotaes.
The target sequence of the As(III) oxidation pathway covers 29 phyla of
bacteria, 6 phyla of archaea and 1
phylum
of
Eukaryotae.
For bacteria, Proteobacteria , Actinobacteria ,Firmicutesand and Bacteroidetesrepresented
the dominant phyla, which were consistent with the results of previous
studies (Xu et al., 2021; C. Zhang et al.,
2021).Halobacteriaof Euryarchaeota and Sordariomycetes of Ascomycotawere the dominant class of bacteria and
eukaryotae
respectively. The functional sequences of As methylation and
demethylation include 20 phyla of bacteria, 4 phyla of archaea and 2
phyla of
fungi.
The
bacteria mainly belonged
toRhodopseudomonas in Proteobacteria , SymbiobacteriuminFirmicutes ,Dehalogenimonas in Chloroflexi and Streptomyces inActinobacteria . The dominant archaea were the classMethanomicrobia and Halobacteria of Euryarchaeota .Saccharomycetes in Ascomycota was the dominant fungi,
which
also
fit
with previous research (Jia et al., 2013a; S.-Y. Zhang et al., 2017).
These results suggest that AsgeneDB covers a high diversity of
microorganisms involved in As metabolism, providing a useful platform
for searching and annotating As metabolic genetic pathways and related
key
microorganisms
in the environment.
Application
ofAsgeneDB
for functional and taxonomic profiling of metagenomes
We
applied AsgeneDB and five other orthology databases (KEGG, eggNOG, COG,
arCOG
and
KOG) for taxonomic and functional profiling of As metabolism in
metagenomes from freshwater, hot spring, marine sediment, and soil
(Figures
4 and 5). The number of As metabolic gene families detected by searching
sample data against
AsgeneDB
ranged from 13 to 46 in the four habitats, which was
significantly
greater (HSD, p < 0.001) than the other four databases
(1-13 in KEGG, 1-4 in eggNOG, 4-8 in COG, one in arCOG, and one in KOG)
(Figure
4a). Moreover, AsgeneDB substantially increased the metagenomic mapping
rates than other five databases (Figure 4b).
Arsenic
metabolic functional genes and pathways varied widely in different
habitats (Figure
4c).
Among the five metabolic pathways, the most abundant pathway was As
transport and the least abundant was As(V)
respiration.
Gene abundance also varied by ecosystem and by ecosystem geographical
location,
indicating
differences in the biogeographical distribution of microbial communities
(C. Zhang et al., 2021; S.-Y. Zhang et al., 2017). Within the four
habitats, the As metabolism microbiomes were most similar between marine
sediment and
soil.
Freshwater samples had the lowest diversity in their As
metabolism-driven
microbiomes.
A wide variety of organisms that belong to certain pathways were
identified within the samples. Organisms that drive As(III) oxidation,
such as Candidatus Korarchaeota , Balneolaeota ,Chlorobi , Spirochaetes , Ignavibacteriae ,Chlamydiae , Thermodesulfobacteria and Thermotogaewere found in all habitats except freshwater.Candidatus
Omnitrophica and Synergistetes drove the oxidation of As(III) in
marine
sediment and soil. Deferribacteres , which oxidize As(III), were
found only in hot spring and marine sediment. Synergistetes ,Chlorobi , and Candidatus Lokiarchaeota drove As
methylation in sediment and soil, while only Fusobacteria drove
As methylation in
marine
sediment. Candidatus Bipolaricaulot drove As methylation in all
tested environments except freshwater. Calditrichaeotazai drove
As transport and reduction in hot spring, marine sediment and soil, but
only drove As transport in freshwater.Dictyoglomihad extensive As(V) reduction functions in hot spring, marine sediment
and soil, but was not detected in freshwater. Microbes associated with
As(V) respiration were the least diverse with only Chrysiogenetes
Deferribacteres , Firmicutes , Proteobacteria in bacteria
andEuryarchaeotain archaea detected
(Figure
5). In contrast, microorganisms with As transport genes were the most
diverse, correlating
with
the gene abundance of various metabolic pathways in the environment
(Figure 4c).
D iscussion
Combined
with metagenomic methods, the identification of microbial arsenic
metabolism pathways and corresponding driving microbes can provide a
comprehensive perspective for understanding the complexity of microbial
arsenic metabolism in the environment.
This
study
develops
AsgeneDB, a manually curated orthology As metabolism gene database, for
fast and accurate annotating
As
metabolic genes in shotgun metagenome sequence data.
AsgeneDB
has three major advantages over
automatically
generated orthology databases: precise definitions, comprehensive gene
families, and rapid automated analysis of metagenomic data.
Firstly, it has the precise definition of As metabolic gene families,
which were
manually
inspected and retrieved using keywords combined with sequence
similarity,
unlike
other databases that automatically generate orthology groups based on
sequence similarities or sharing of functional domains (Galperin et al.,
2021; Huerta-Cepas et al., 2019; Kanehisa et al., 2016).
Precise
definitions prevent the misattribution of genes to incorrect families. A
typical example is arsB and ACR3 , which belong to two
different phylogenetic branches evolutionarily. Previous studies have
demonstrated thatACR3and arsB have complementary environmental abundances (Dunivin,
Yeh, & Shade, 2019), but they are rarely separated in large databases
(Achour et al., 2007; Cai et al.,
2009).
Secondly, the automatically generated orthology databases cover between
only 2-20 gene families involved in microbial
As
metabolism (Figure S2) (Galperin et al., 2021; Huerta-Cepas et al.,
2019; Kanehisa et al., 2016), whereas
AsgeneDB
covers 59 gene families with 414,773 representative sequences.
AsgeneDB
reflects the latest knowledge and research progress of the As metabolism
research and covers gene families not included in existing databases,
for example, arsJ , a gene family that encodes
organoarsenicals
resistance in microorganism (J. Chen et al., 2016), arsP , a gene
family that encodes trivalent
organoarsenicals
(MAs(III)) effluents (J. Chen, Madegowda, et al., 2015), andGstB , a newly discovered alternative pathway to arsenate
resistance in bacteria (Chrysostomou et al., 2015) and those that encode
trivalent As oxidases:aioR,arxR , arxA and arxB (Liu et al., 2012; Qin et al., 2007;
Zargar et al., 2012). These gene families have not been clearly defined
in other publicly available databases, but play important roles in
microbial metabolism of environmental As (Zhu et al.,
2017).
AsgeneDB
enables researchers to directly study these newly discovered gene
families and metabolic pathways.
Thirdly, as the NCBI RefSeq database has been integrated into AsgeneDB
(Yu et al., 2021) and AsgeneDB itself is relatively small, the Asgene
package and database allow researchers to quickly determine
”who
has As metabolism” and ”what can they do” in microbiome analyses. Unlike
other orthology databases, AsgeneDB allows fast profiling of As
metabolic microbial communities, without huge computational cost or
output file size, no matter which database searching tool is used.
AsgeneDB
takes the ‘small database’ issue observed in genes into account and
presents a solution for this bioinformatics problem (Tu et al., 2019)
and
addresses
it
by
including homologous gene families from multiple orthology databases,
thereby reducing false positives introduced by homologs (Table S4).
In
this study,
we
used the AsgeneDB to analyze microbial As metabolism functional genes
and functional species in four
environments.
Our
results show that AsgeneDB has obvious
advantages
over
current comprehensive databases in the detection of As functional gene
families
and abundance in environmental metagenomic data.
Moreover,
our results also demonstrate that As metabolism genes aioA ,arrA , and arxA are phylogenetically conserved (Dunivin et
al., 2019). aioA gene is limited to Proteobacteria :Alphaproteobacteria , Gammaproteobacteria andBetaproteobacteria . arrA was detected inProteobacteria , Firmicutes and Eurycota , whilearxA was only detected in Proteobacteria andEurycota (Table S6).
Furthermore,
microorganisms
extensively
metabolize As in natural ecosystems.
Functional
genes of different As metabolic pathways could be identified in all
environmental samples, and As transport genes are the most abundant and
As respiratory genes are the least abundant in environmental samples
(Figure
4c).
Previous
work has also shown that detoxification genes (As transport genes) are
more abundant in the microbial communities than As metabolism genes
(As(V)
respiration,
methylation, and demethylation genes, etc.), in order to adapt to a wide
range of As stress environments (Dunivin et al., 2019).
The
genes arrA and arrB encode arsenate reductases that
function in anaerobic environments (Saltikov & Newman, 2003), so they
are more abundant in water and marine sediment.
In addition to the species
previously
shown to have As(III) oxidation function (C. Zhang et al.,
2021),
we find that Chlamydiae , Thermotogae ,Ignavibacteriae , and Aquificae also have As(III) oxidation
functions in specific ecosystems. In addition to previous studies
[such as (Jia et al., 2013b; C. Zhang et al., 2021; S.-Y. Zhang et
al., 2017)], Verrucomicrobia , Spirochaetes ,Ignavibacteriae andCandidatus
Bipolaricaulota were found to have an As methylation function (Figure
5). Our results also show that there are significant differences in the
abundance of functional genes and functional species composition of As
metabolism in microbial communities of different ecosystems.Dictyoglomi , for example, has As(V) reduction properties in hot
spring, marine sediment and soil that are not present in
freshwater.
Therefore,
these results demonstrate the vast diversity and importance of microbial
As metabolism functions in the environment that remain to be explored,
and which will be greatly facilitated by
AsgeneDB.
While
genetic migration and limited genetic diversification can be achieved
through
horizontal gene transfer (HGT) or vertical transfer (Dunivin et al.,
2019), many As metabolism genes, including ACR3 , arsB ,arsD , arsM andaioA ,
have regional dispersal limitations (Dunivin et al., 2019; Fahy et al.,
2015). However, the distribution and diversity of large-scale As
metabolism genes remain to be further
explored.
AsgeneDB
and
Asgene Package are powerful tools for facilitating the analysis of
shotgun metagenomic sequencing data, enabling rapid, comprehensive and
accurate functional analysis of As metabolizing microbial communities in
a variety of environments.
AsgeneDB
and Asgene Package include comprehensive information on microbial
As metabolism and will be updated periodically.
Availability
of data and materials
AsgenePackage are available on the github
(https://github.com/XinweiSong/Asgene).
AsgeneDB files can be downloaded from cyverse
(https://de.cyverse.org/data/ds
/iplant/home/xinwei/AsgeneDB/AsgeneDB.zip).