Core database construction 
An improved pipeline based on previous research was used to build AsgeneDB (Tu et al., 2019; Yu et al., 2021). Firstly, the core database was manually constructed based on the current knowledge and literature of As metabolism (S.-C. Chen et al., 2020; H.-T. Wang et al., 2019; C. Zhang et al., 2021; Zhu et al., 2017). As metabolic genes in KEGG were also referenced (Kanehisa et al., 2016). Target sequences were downloaded from the Swiss-Prot and TrEMBL databases (The UniProt Consortium, 2017) by creating and refining keywords for each gene family involved in As metabolic pathways (including gene and protein names). To ensure the accuracy of AsgeneDB, the seed sequences of each gene family were checked manually based on their annotations and similarity to other sequences, especially for sequences with no reference sequence in Swiss-Prot. For each gene family, a self-vs.-self usearch (version 11.0, 30% global identity cutoff) was then performed to generate a distance matrix between different sequences. A nearest neighbor clustering procedure was then carried out to cluster sequences into groups. The outlier sequences were then checked again to confirm their annotation information in Swiss-Prot and TrEMBL and to remove abnormal sequences. The remaining sequences were then retained as the core database for As metabolic gene families (Figure 1a).