Case study
We applied AsgeneDB and the orthology databases (KEGG, eggNOG, COG, arCOG and KOG) to analyze microbial As metabolism from four distinct habitats: freshwater, hot spring, marine sediment and soil. Forty metagenome sequencing data files were downloaded from the NCBI SRA database (https://www.ncbi.nlm.nih.gov/sra) (Table S2). Raw reads were quality-controlled using Trimmomatic v2.39 (Bolger, Lohse, & Usadel, 2014) to trim adaptors and primers, and to filter short (< 50 bp) and low-quality reads (< 20 bases). The forward and reverse quality-controlled reads were merged by the program idba (Peng, Leung, Yiu, & Chin, 2012). Merged shotgun metagenome sequences were searched against KEGG, eggNOG, COG, arCOG, KOG and AsgeneDB databases using DIAMOND (parameters: -k 1 1e-10 -p 20 –query-cover 80 –id 50) (Buchfink, Xie, & Huson, 2015). Subsequent standardization of gene abundance between samples and statistics of gene abundance and As metabolic microbial communities were performed with R studio. We assessed significant differences for the number and abundance (RPKM) of key As metabolic gene families in environmental samples detected by KEGG, eggNOG, COG, arCOG, KOG and AsgeneDB using one-way analysis of variance (ANOVA) and Tukey’s Honest Significant Difference (HSD).