As(V) Reduction Pathway
Gene families such as arsC , acr2 and GstB are included for this pathway with 100,357 sequences and 84 homologous orthology groups (Figure 3; Table S5). Nearly every extant microbe has ArsB or Acr3 efflux permeases for As(III) detoxification (Zhu et al., 2017). When As(V) became the predominant soluble species, all cells had to do was to reduce As(V) to As(III), the substrate of ArsB or Acr3, and they would become resistant to As(V) (Mukhopadhyay & Rosen, 2002). However, ArsC, Acr2, GstB, etc. located in the cytoplasm can reduce As(V) in the cytoplasmic membrane and then excrete As(III) through the ArsB or Acr3 efflux pum (Bhattacharjee, Sheng, Ajees, Mukhopadhyay, & Rosen, 2010; Chrysostomou, Quandt, Marshall, Stone, & Georgiou, 2015). The transcriptional repressor (ArsR) controls these ars-operons (J. Chen, Nadar, & Rosen, 2017; Qin et al., 2007).
As( III) Oxidation Pathway
There are 15 gene families responsible for As(III) oxidation, with a total of 92,183 sequences and 39 homologous orthology groups (Figure 3; Table S5). As(III) oxidizing microorganisms exist widely in nature and include both heterotrophic and chemo/photosynthetic autotrophic microorganisms (Hamamura et al., 2009). During early life, As(III) oxidation by anaerobes would have produced As(V) in the absence of an oxygen-containing atmosphere, which opened a niche for As(V)-respiring microbes prior to the Great Oxidation Event (GOE) (Kulp, 2014). As(III) oxidation is catalyzed by the enzyme As(III) oxidase. This enzyme is composed of two subunits, a large subunit (α) having molybdopterin and a [3Fe-4S] cluster (AioA) and a smaller subunit (β) incorporating a Rieske-type [2Fe-2S] cluster (AioB) (Hamamura et al., 2009). BothaioS /aroS /aoxS (sensor histidine kinase) andaioR /aroR/aoxR (transcriptional regulator) can regulate expression of aio genes via recognizing As(III) (Sardiwal, Santini, Osborne, & Djordjevic, 2010). The operon sometimes has aaioX /arxX gene that encodes an As(III)-binding protein involved in As(III)-based signaling and regulation of As(III) oxidation, or amoeA gene encoding MoeA protein that synthesizes the molybdenum cofactor of AioAB oxidase (Sardiwal et al., 2010). A new type of As(III) oxidase (arxA ) has been discovered with both As(V) reductase and As(III) oxidase activities in vitro (Zargar, Hoeft, Oremland, & Saltikov, 2010). In addition to arxA , arxB , arxC , arxD andarxH encode for As(III) oxidation coupled to photosynthesis (Zargar et al., 2012). An adjacent and divergent gene cluster, arxXSR , encodes putative regulatory proteins, a periplasmic substrate-binding protein specific for phosphate (ArxX), a two-component histidine kinase sensor (ArxS), and a response regulator (ArxR) (Zargar et al., 2012). In addition, methylarsenite-specific oxidase ArsH can oxidize methylarsenite to methylarsenate (J. Chen, Bhattacharjee, & Rosen, 2015; Qin et al., 2006).
As (De)Methylation Pathway
Three gene families, including arsM , As3mt and arsI are involved in As methylation and demethylation pathways with 7,862 sequences and 24 homologous orthology groups (Figure 3; Table S5). More recent reports of methylated As show that As methylation is widespread in the environment (J. Chen, Bhattacharjee, et al., 2015; P. Wang, Sun, Jia, Meharg, & Zhu, 2014; C. Zhang et al., 2021). Methylation is catalyzed by the enzyme As(III) S-adenosylmethionine (SAM) methyltransferase (ArsM), designated as AS3MT in animals and as ArsM in microorganisms. The gene arsI , which catalyzes demethylation of organic As(III), was identified and characterized from the environmental isolate bacterium Bacillussp. MD1 (Yoshinaga & Rosen, 2014) and from the cyanobacteriumNostoc sp. 7120 (Yan, Ye, Xue, & Zhu, 2015). ArsI, a nonheme iron-dependent dioxygenase with C–As lyase activity, cleaves the C–As bond in MAs(III), trivalent roxarsone, and other trivalent aromatic Asals (Yoshinaga, Cai, & Rosen, 2011). Putative ArsI orthologs were found only in bacterial species, suggesting that alternate pathways of organoarsenical demethylation might exist in other organisms (Yoshinaga & Rosen, 2014; Zhu et al., 2017).
Taxonomic composition of As metabolic genes and pathways in AsgeneDB
To understand the taxonomic composition of As metabolism genes and pathways in AsgeneDB, we mapped sequences targeting As metabolism genes and pathways to reference genomes from NCBI RefSeq. The results indicate that AsgeneDB covers 46 phyla and 1,653 genera of bacteria, archaea and fungi (Table S1). In the As transport pathway, AsgeneDB covered 33 phyla and 1,141 genera of bacteria, among which the dominant phyla were Proteobacteria , Actinobacteria ,Firmicutes and Bacteroidetes ( Table S6).Euryarchaeotawas the dominant phyla in 6 phyla of archaea. The predominant Eukaryotaes wereSordariomycetes , Eurotiomycetes and Saccharomycetesin Ascomycota and Ustilaginomycetes inBasidiomycota . In addition, Halobacteria ofEuryarchaeota , Betaproteobacteria ,Deltaproteobacteria and Gammaproteobacteria class ofProteobacteria , Clostridia in Firmicutes andDeferribacteres in Deferribacteres drove the As(V) respiratory pathway. For the As(V) reduction pathway, AsgeneDB covered 34 bacterial phyla, mainly Proteobacteria , Actinobacteria , Firmicutesand Bacteroidetes . It covers 6 archaea, mainly Euryarchaeota , CandidatusThermoplasmatota and Thaumarchaeota .Saccharomycetes and Eurotiomycetes of Ascomycotawere the dominant Eukaryotaes. The target sequence of the As(III) oxidation pathway covers 29 phyla of bacteria, 6 phyla of archaea and 1 phylum of Eukaryotae. For bacteria, Proteobacteria , Actinobacteria ,Firmicutesand and Bacteroidetesrepresented the dominant phyla, which were consistent with the results of previous studies (Xu et al., 2021; C. Zhang et al., 2021).Halobacteriaof Euryarchaeota and Sordariomycetes of Ascomycotawere the dominant class of bacteria and eukaryotae respectively. The functional sequences of As methylation and demethylation include 20 phyla of bacteria, 4 phyla of archaea and 2 phyla of fungi. The bacteria mainly belonged toRhodopseudomonas in Proteobacteria , SymbiobacteriuminFirmicutes ,Dehalogenimonas in Chloroflexi and Streptomyces inActinobacteria . The dominant archaea were the classMethanomicrobia and Halobacteria of Euryarchaeota .Saccharomycetes in Ascomycota was the dominant fungi, which also fit with previous research (Jia et al., 2013a; S.-Y. Zhang et al., 2017). These results suggest that AsgeneDB covers a high diversity of microorganisms involved in As metabolism, providing a useful platform for searching and annotating As metabolic genetic pathways and related key microorganisms in the environment.
Application ofAsgeneDB for functional and taxonomic profiling of metagenomes
We applied AsgeneDB and five other orthology databases (KEGG, eggNOG, COG, arCOG and KOG) for taxonomic and functional profiling of As metabolism in metagenomes from freshwater, hot spring, marine sediment, and soil (Figures 4 and 5). The number of As metabolic gene families detected by searching sample data against AsgeneDB ranged from 13 to 46 in the four habitats, which was significantly greater (HSD, p < 0.001) than the other four databases (1-13 in KEGG, 1-4 in eggNOG, 4-8 in COG, one in arCOG, and one in KOG) (Figure 4a). Moreover, AsgeneDB substantially increased the metagenomic mapping rates than other five databases (Figure 4b).
Arsenic metabolic functional genes and pathways varied widely in different habitats (Figure 4c). Among the five metabolic pathways, the most abundant pathway was As transport and the least abundant was As(V) respiration. Gene abundance also varied by ecosystem and by ecosystem geographical location, indicating differences in the biogeographical distribution of microbial communities (C. Zhang et al., 2021; S.-Y. Zhang et al., 2017). Within the four habitats, the As metabolism microbiomes were most similar between marine sediment and soil. Freshwater samples had the lowest diversity in their As metabolism-driven microbiomes.
A wide variety of organisms that belong to certain pathways were identified within the samples. Organisms that drive As(III) oxidation, such as Candidatus Korarchaeota , Balneolaeota ,Chlorobi , Spirochaetes , Ignavibacteriae ,Chlamydiae , Thermodesulfobacteria and Thermotogaewere found in all habitats except freshwater.Candidatus Omnitrophica and Synergistetes drove the oxidation of As(III) in marine sediment and soil. Deferribacteres , which oxidize As(III), were found only in hot spring and marine sediment. Synergistetes ,Chlorobi , and Candidatus Lokiarchaeota drove As methylation in sediment and soil, while only Fusobacteria drove As methylation in marine sediment. Candidatus Bipolaricaulot drove As methylation in all tested environments except freshwater. Calditrichaeotazai drove As transport and reduction in hot spring, marine sediment and soil, but only drove As transport in freshwater.Dictyoglomihad extensive As(V) reduction functions in hot spring, marine sediment and soil, but was not detected in freshwater. Microbes associated with As(V) respiration were the least diverse with only Chrysiogenetes Deferribacteres , Firmicutes , Proteobacteria in bacteria andEuryarchaeotain archaea detected (Figure 5). In contrast, microorganisms with As transport genes were the most diverse, correlating with the gene abundance of various metabolic pathways in the environment (Figure 4c).
D iscussion
Combined with metagenomic methods, the identification of microbial arsenic metabolism pathways and corresponding driving microbes can provide a comprehensive perspective for understanding the complexity of microbial arsenic metabolism in the environment. This study develops AsgeneDB, a manually curated orthology As metabolism gene database, for fast and accurate annotating As metabolic genes in shotgun metagenome sequence data. AsgeneDB has three major advantages over automatically generated orthology databases: precise definitions, comprehensive gene families, and rapid automated analysis of metagenomic data.
Firstly, it has the precise definition of As metabolic gene families, which were manually inspected and retrieved using keywords combined with sequence similarity, unlike other databases that automatically generate orthology groups based on sequence similarities or sharing of functional domains (Galperin et al., 2021; Huerta-Cepas et al., 2019; Kanehisa et al., 2016). Precise definitions prevent the misattribution of genes to incorrect families. A typical example is arsB and ACR3 , which belong to two different phylogenetic branches evolutionarily. Previous studies have demonstrated thatACR3and arsB have complementary environmental abundances (Dunivin, Yeh, & Shade, 2019), but they are rarely separated in large databases (Achour et al., 2007; Cai et al., 2009).
Secondly, the automatically generated orthology databases cover between only 2-20 gene families involved in microbial As metabolism (Figure S2) (Galperin et al., 2021; Huerta-Cepas et al., 2019; Kanehisa et al., 2016), whereas AsgeneDB covers 59 gene families with 414,773 representative sequences. AsgeneDB reflects the latest knowledge and research progress of the As metabolism research and covers gene families not included in existing databases, for example, arsJ , a gene family that encodes organoarsenicals resistance in microorganism (J. Chen et al., 2016), arsP , a gene family that encodes trivalent organoarsenicals (MAs(III)) effluents (J. Chen, Madegowda, et al., 2015), andGstB , a newly discovered alternative pathway to arsenate resistance in bacteria (Chrysostomou et al., 2015) and those that encode trivalent As oxidases:aioR,arxR , arxA and arxB (Liu et al., 2012; Qin et al., 2007; Zargar et al., 2012). These gene families have not been clearly defined in other publicly available databases, but play important roles in microbial metabolism of environmental As (Zhu et al., 2017). AsgeneDB enables researchers to directly study these newly discovered gene families and metabolic pathways.
Thirdly, as the NCBI RefSeq database has been integrated into AsgeneDB (Yu et al., 2021) and AsgeneDB itself is relatively small, the Asgene package and database allow researchers to quickly determine ”who has As metabolism” and ”what can they do” in microbiome analyses. Unlike other orthology databases, AsgeneDB allows fast profiling of As metabolic microbial communities, without huge computational cost or output file size, no matter which database searching tool is used. AsgeneDB takes the ‘small database’ issue observed in genes into account and presents a solution for this bioinformatics problem (Tu et al., 2019) and addresses it by including homologous gene families from multiple orthology databases, thereby reducing false positives introduced by homologs (Table S4).
In this study, we used the AsgeneDB to analyze microbial As metabolism functional genes and functional species in four environments. Our results show that AsgeneDB has obvious advantages over current comprehensive databases in the detection of As functional gene families and abundance in environmental metagenomic data. Moreover, our results also demonstrate that As metabolism genes aioA ,arrA , and arxA are phylogenetically conserved (Dunivin et al., 2019). aioA gene is limited to Proteobacteria :Alphaproteobacteria , Gammaproteobacteria andBetaproteobacteria . arrA was detected inProteobacteria , Firmicutes and Eurycota , whilearxA was only detected in Proteobacteria andEurycota (Table S6). Furthermore, microorganisms extensively metabolize As in natural ecosystems. Functional genes of different As metabolic pathways could be identified in all environmental samples, and As transport genes are the most abundant and As respiratory genes are the least abundant in environmental samples (Figure 4c). Previous work has also shown that detoxification genes (As transport genes) are more abundant in the microbial communities than As metabolism genes (As(V) respiration, methylation, and demethylation genes, etc.), in order to adapt to a wide range of As stress environments (Dunivin et al., 2019). The genes arrA and arrB encode arsenate reductases that function in anaerobic environments (Saltikov & Newman, 2003), so they are more abundant in water and marine sediment.
In addition to the species previously shown to have As(III) oxidation function (C. Zhang et al., 2021), we find that Chlamydiae , Thermotogae ,Ignavibacteriae , and Aquificae also have As(III) oxidation functions in specific ecosystems. In addition to previous studies [such as (Jia et al., 2013b; C. Zhang et al., 2021; S.-Y. Zhang et al., 2017)], Verrucomicrobia , Spirochaetes ,Ignavibacteriae andCandidatus Bipolaricaulota were found to have an As methylation function (Figure 5). Our results also show that there are significant differences in the abundance of functional genes and functional species composition of As metabolism in microbial communities of different ecosystems.Dictyoglomi , for example, has As(V) reduction properties in hot spring, marine sediment and soil that are not present in freshwater. Therefore, these results demonstrate the vast diversity and importance of microbial As metabolism functions in the environment that remain to be explored, and which will be greatly facilitated by AsgeneDB.
While genetic migration and limited genetic diversification can be achieved through horizontal gene transfer (HGT) or vertical transfer (Dunivin et al., 2019), many As metabolism genes, including ACR3 , arsB ,arsD , arsM andaioA , have regional dispersal limitations (Dunivin et al., 2019; Fahy et al., 2015). However, the distribution and diversity of large-scale As metabolism genes remain to be further explored. AsgeneDB and Asgene Package are powerful tools for facilitating the analysis of shotgun metagenomic sequencing data, enabling rapid, comprehensive and accurate functional analysis of As metabolizing microbial communities in a variety of environments. AsgeneDB and Asgene Package include comprehensive information on microbial As metabolism and will be updated periodically.
Availability of data and materials
AsgenePackage are available on the github (https://github.com/XinweiSong/Asgene). AsgeneDB files can be downloaded from cyverse (https://de.cyverse.org/data/ds /iplant/home/xinwei/AsgeneDB/AsgeneDB.zip).