Database sources
We used the UniProt database to retrieve seed sequences and construct the core database (The UniProt Consortium, 2017). The orthology databases used for database merging and homologous gene identification in this study included arCOG (Makarova et al., 2015), COG (Galperin et al., 2021), eggNOG (Huerta-Cepas et al., 2019) and KEGG (Kanehisa et al., 2016). The microbial NCBI RefSeq database (O’Leary et al., 2016) was used to enrich AsgeneDB and for taxonomically classifying microbial communities of As metabolism.
Metagenomic profiling of Asmetabolic genes 
To facilitate user operation, an R Package (Asgene ) is provided for metagenomic alignment (nucleic acid or protein sequence), subsequent gene family abundance statistics, and sample abundance standardization. The Asgene Package is available on github (https://github.com/XinweiSong/Asgene). Users only need to choose a database search tool according to their needs (e.g., USEARCH, BLAST or DIAMOND) and input several parameters (e.g., working path, search parameters of tool and filetype) to automatically analyze statistics and output statistical results. Users can select gene abundance statistics (Option: abundance) to normalize read counts per kilobase per million reads (RPKM) to eliminate differences in sequencing depth and reference sequence length between samples. In addition, if the user selects functional species statistics (Option: taxonomy), the statistical results of the driving species of each As metabolism gene at different classification levels in the sample can be generated automatically (Figure 1c). Our work can be used to analyze metagenomic data, providing functional profiles at the gene family level and composition of functional microbial community at various classification levels in different environments.