Comparing AsgeneDB against established orthology databases
To
show the necessity of building a manually managed As metabolism gene
database, we compared the coverage of As metabolism genes (subfamily;
Figure 2) in AsgeneDB to the
main
public orthology databases.
Of
the 59 gene subfamilies recruited to AsgeneDB fewer than a third were
found in any other single database with the largest proportion found in
KEGG (16 gene subfamilies), followed by COC (13 gene subfamilies),
eggnog (10 gene subfamilies), arCOG (6 gene subfamilies), and KOG (2
gene
subfamilies).
AsgeneDB further contains several key As metabolic gene families that
are missing in the four common orthology databases,
including
As(V) respiratory reductase (arrA and arrB ), organic As
efferent osmotic enzyme (arsJ and arsP ), pentavalent
As(V)
reductase (GstB ) and trivalent As(III) oxidase (aioR ,arxR , arxA andarxB ).
In
addition to containing more genes, the families defined by AsgeneDB were
considered one
homologous
group in the four publicly available homologous databases. For example,
both arsB and acr3 are involved in arsenite efflux even
though they belong to two different phylogenetic
clades
(Achour et al., 2007; Cai et al., 2009; Rosen, 2002). However, in KEGG,
COG and eggNOG databases, arsB and ACR3 are mixed into one
orthology group (Table S3). Similarly, arsA , ASNA1 andGET3 are homologous genes (Hemmingsson, Zhang, Still, & Naredi,
2009; Kurdi-Haidar et al., 1996) that have no clear distinction in COG,
KEGG and KOG. AsgeneDB is therefore a superior database for determining
gene families related to As metabolism and has obvious advantages over
existing resources in terms of coverage, representativeness and
accuracy.