Introduction
Arsenic (As) is classified as a group I carcinogen by the International Agency for Research on Cancer, known as both “the king of poisons” and “the poison of kings” (Zheng, 2020). As has therefore been a prime focus of ecology and environmental sciences (S.-Y. Zhang et al., 2017; Zheng, 2020). Once elemental As is released from mineral deposits by geological, agricultural, and industrial processes, the element’s toxicity and mobility can be greatly altered by microbial metabolism (Achour, Bauda, & Billard, 2007; Oremland & Stolz, 2003). These metabolic processes play a major role in the global As cycle through microbial oxidation, respiration, reduction, and methylation (Mukhopadhyay, Rosen, Phung, & Silver, 2002) and are mediated by a variety of genes. It has been reported that almost all microorganisms have As resistance and metabolism genes (Zhu, Xue, Kappler, Rosen, & Meharg, 2017). For example, As redox genes encoding cytoplasmic arsenate [As(V)] reductase (arsC ), periplasmic As(V) respiratory reductase (arrAB ) and arsenite [As(III)] oxidase (aioAB/arxA ) affect species transformation between As(V) and As(III) [7–9] while As(III) S-adenosine methionine methyltransferase (arsM) and nonheme iron-dependent dioxygenase (arsI) with C-As lyase activity catalyze As methylation and demethylation (Jia et al., 2013a; Yoshinaga & Rosen, 2014). Mechanisms involved in As metabolism can also be coopted from other processes with As(III) and As(V) acting as analogues of glycerol and phosphate, allowing microbial uptake through glycerol transporters (GlpF ) and phosphate transporters (Pit/Pst ) (Borgnia, Nielsen, Engel, & Agre, 1999; Wysocki et al., 2001). As these processes greatly change the toxicity and bioavailability of As, the study of microbial As metabolism genes is of great importance for understanding the process of environmental As metabolism and microbial remediation potential.
Although the mechanisms of microbial As metabolism are well documented and characterized, the distribution and diversity of As metabolic genes in microbial communities is still unclear due to the large proportion of uncultured microorganisms in environmental samples. Previous works investigating the distribution and diversity of several genes have typically used targeted primer sets to conduct analyses such as polymerase chain reaction (PCR), cloning, denaturing gradient gel electrophoresis (DGGE), microarray-based metagenomic techniques (e.g. GeoChip) and quantitative PCR (qPCR) (Achour et al., 2007; Cai, Liu, Rensing, & Wang, 2009; H.-T. Wang et al., 2019; C. Zhang et al., 2021). These methods are limited by their low throughput that only targets one or several specific genes and also by nonspecific amplification introduced by the primers. In addition, as primers cannot be designed for unknown nucleic acid sequences, the inability to detect unknown microorganisms is the biggest obstacle to this kind of technology. Characterization of microbial-induced As metabolism at gene and species level resolution has become an important method to better understand microbial As metabolism in the current metagenomic era. In contrast, high-throughput sequencing techniques target all genes and do not rely on the specificity and coverage of primers. Shotgun metagenomic sequencing technology can probe the function of unknown microbiome and enable us to have a detailed understanding of As metabolism in a complex microbiome, so that microbiome metabolism can be used to address environmental issues (Xiao et al., 2016; S.-Y. Zhang et al., 2017). However, metagenomic data analysis requires comprehensive and reliable orthology databases for accurate metagenomic profiling of functional gene families. An undesired observation is that the results of metagenomic analysis are substantially affected by orthology database (Nayfach & Pollard, 2016).
Orthology databases such as arCOG (Archaeal Clusters of Orthologous Genes) (Nayfach & Pollard, 2016), COG (Clusters of Orthologous Groups) (Galperin et al., 2021), eggNOG (Huerta-Cepas et al., 2019) and KEGG (Kanehisa, Sato, Kawashima, Furumichi, & Tanabe, 2016) have been developed to date and are widely used for functional annotation in both genomic and metagenomic studies. These databases have their own distinct features due to differences in the design concept, with arCOG for archaeal annotation (Makarova, Wolf, & Koonin, 2015), COG and eggNOG for annotation of orthologous groups (Galperin et al., 2021; Huerta-Cepas et al., 2019) and KEGG for linking genes with pathways (Kanehisa et al., 2016). When As metabolism is considered, analytical limitations encountered in these databases include low coverage of As metabolic genes, difficulty in distinguishing homologous genes, and long database search times (Tu, Lin, Cheng, Deng, & He, 2019; Yu et al., 2021). Therefore, the development of a comprehensive and accurate database of As metabolism genes is essential for efficient analysis of As metabolism function in microbial communities.
In this study, to understand the microbial community of As metabolism in the environment, we developed a manually curated As metabolism gene database (AsgeneDB), which covers five As metabolic pathways (transport, respiratory, reduction, oxidative and methylation/demethylation processes), 59 As metabolism gene families and 414,773 representative sequences. AsgeneDB integrates multiple lineal homology databases, including 46 phyla and 1,653 genera of bacteria, archaea and fungi. AsgeneDB enables researchers to directly study newly discovered arsenic metabolic pathways and gene families, allowing high specificity, comprehensiveness, representativeness, and accuracy. To facilitate metagenomic data comparison and statistics, we developed an R packageAsgene that can be used to automatically provide statistical results of gene family abundance and functional community composition at different classification levels in different environments. AsgeneDB was compared with five other orthology databases by analyzing metagenomic sequencing data from four habitats (freshwater, hot spring, marine sediment and soil). Our results show that AsgeneDB could detect more As metabolism genes and abundance in environmental microorganisms. In addition, there were significant differences in the abundance of As functional genes and functional driving species in microbial communities among different habitats. As transport genes are more abundant in the microbial communities than As(V) respiration, methylation, and demethylation genes. These results demonstrate the vast diversity and importance of microbial As metabolism functions in the environment that remain to be explored. Therefore, AsgeneDB and Asgene Package will become a convenient tool for comprehensive and accurate metagenomic analysis of arsenic metabolism, greatly promoting research in this area.