# Metagenomes

Abstract

Importance

Introduction
Historically, b-lactam antibiotics are the most widely used antibiotics in clinical settings around the world (Garau 2005, Boeckel 2014 ESAC 2009, ECDC 2012), but the successful of antibiotic treatments is compromised by the development of resistance in many important clinical pathogens (Davies 2010). The major cause of resistance to b-lactams antibiotics are bacterial enzymes called b-lactamases with the capacity to hydrolyze the molecular structure of the b-lactam antibiotic (Davies 1994). Many studies have assessed the presence and diversity of b-lactamases in clinical settings (Paterson 2005, Rice 2001, Ramphal 2006, Shahid 2014, Sullivan 2015), but no enough efforts have been put in assess b-lactamases in non-clinical settings.

Studies assessing the AR phenomenon in non-clinical environments usually are based on functional metagenomics studies, PCR reactions, MIC tests (Donato 2010, Bhullar 2012, Segawa 2012Amos 2014, Forsberg 2014, Su 2014Ma 2014), but in the last years, the use of metagenomic approaches (Yang 2013,Li 2015Fondi 2016) appear such as an interesting tool to assess ARGs in environmental samples. In this context, the focus on antibiotic resistance genes in natural environments should be considered, specially if we consider the evidence that these genes have a long evolutionary history in natural environments (Aminov 2009, Hall 2004, Garau 2005). Thus, natural environments appear such as a reservoir of potential ARGs to pathogenic bacteria (Berglund 2015, Versluis 2015) that cannot be ignored.

Despite that in the last years more studies are focused on ARGs in the environment (Czekalski 2015, Bhullar 2012, Cristóbal-Azkarate 2014, Nesme 2014, Karkman 2016), the lack of studies focused on the high diversity of b-lactamase genes in different environments, avoid the understanding of important process with clinical implications such as b-lactamase gene transfer between environments and the impact of anthropogenic forces on both b-lactamase content and diversity in natural environments.

In this study we assess the presence and diversity of b-lacatamases in different environments through a wide metagenomic approach and network oriented analysis, providing important findings on the pool of b-lactamase genes in different environments.

Results

An effort to understand the scope of b-lactam resistance in the environment was performed through a global survey of b-lactamases present in 232 shotgun metagenomes. Metagenomic data sets were obtained from two important public repositories. These data sets  account for 770 Gigabytes of information and more than 4.7 billion of metagenomic reads (details can be found in table S1); and are related to different environments such as agricultural soils, non-agricultural soils, glaciers, fresh water, oceans, human gut, cow feces, cow rumen and wastewater treatment plant environment. The metagenomic reads were compared to the EX-B database; a database that include more than 1500 b-lactamase genes (for details see material and methods). Metagenomic reads containing b-lactamases were identified by BLASTX  based on their similarity to the EX-B database at the cutoff thresholds described in material and methods (percent of identity equal o higher than 50%, e-value equal or lower than $$10^{-5}$$, and a bit score equal o higher than 30). Environmental b-lactamases can be diverse relative to clinically characterized b-lactamases (Forsberg 2014), due to we chose a percent of identity equal o higher than 50%. Reads matching b-lactamases were used to construct distance matrices and b-lactamase gene networks, which were used for downstream analysis.

Abundance and diversity of b-lactamases in the environment
The metagenomic analysis show that b-lactamases were present in each metagenome analyzed, ranged from 0.0003% (percent of reads assigned to b-lactamases from the total number of analyzed reads) in a ocean metagenome (accession number SRS582462) to 0.0335% in a human gut metagenome (accession number SRS056259). The abundance of b-lactamases per environment is in average a 0,01% in soils (n=80 metagenomes, including agricultural and non-agricultural soils), 0.0068% in glacier (n=11), 0.0046% in fresh water (n=11), 0.002% in ocean metagenomes (n=22), 0.0121 in human gut (n=63), 0.0049% in cow gut (n=27 metagenomes, including feces and rumen samples) and 0.0092% in wastewater treatment plant environment (n=19). Soils and glaciers metagenomes show the higher diversity of b-lactamases genes (number of different b-lactamase genes), followed by wastewater metagenomes. Is important indicate that some b-lactamase genes can present different variants, for instance, blaOXA gen present 257 different variants in the EX-B database, but all the variants hits obtained in a given sample/environment are assigned to blaOXA gene, which explains the differences between the number of b-lactamases in the EX-B database and the number of b-lactamase genes in the richness graph. The total b-lactamase relative abundance show that soils, human gut and wastewater metagenomes are the most b-lactamase enriched environments.

The presence of b-lactamases, grouped according molecular classes, show differences according environment (Fig. 2); thus, class A b-lactamases are dominant in non-agricultural soils, cow (including feces and rumen metagenomes) and human gut environments (more than 70% in each case), class B b-lactamases are dominant in agricultural soils, fresh water, oceans and wastewater treatment plant environments (between 40 to 60% in each environment), class C b-lactamases are more represented in agricultural soils and fresh water environments (20% approx. in each environment) and class D b-lactamases being more abundant in fresh water, wastewater treatment plant and glacier environments (from 15% to 30% approx.). A more detailed picture of b-lactamases in the environment is presented in table 1; where b-lactamase genes were analyzed according their occurrence in a particular environment throughout an indicator species analysis (Dufrene 1997). According these results, 16 b-lactamase genes show a high faithfulness of occurrence in non-agricultural soils, other 16 b-lactamase genes show a high faithfulness of occurrence in agricultural soils and 14 b-laxctamase genes show a higher probability to be present in the wastewater treatment plant environment. Only four b-lactamases (blaEBR, CfxA, HGI and mecA) were found to be highly present in the human gut environment (p<0.005) instead of other analyzed environments. Interestingly, when the environments are grouped according level of anthropogenic impact (anthropogenic impact level 1= wastewater treatment plant; level 2= human gut; level 3=  agricultural soils, fresh water, oceans and cow gut; level 4= non-agricultural soils and glaciers), the less impacted environments show a high level of faithfulness of occurrence of b-lactamase genes (50 genes for anthropogenic impact level 3 and 4) than the observed in the more anthropogenic impacted environments, where only 13 b-lactamase genes show faithfulness of occurrence.

b-lactamase hits obtained by BLAST were used to construct metagenome distance matrices and b-lactamase gene networks. When all the b-lactamase hits were used to construct the gene network (Fig. S1), different cluster were clearly observable; the same trend was obtained when genes poorly described at gene level (i.e. identified only such b-latamase, ESBL gene or class A/B/C/D) were removed from the analysis (Figure 3). The network analysis indicate that each clusters harbor metagenomes almost exclusively related to a given environment; thus five cluster representing soils (including a differentiation between agricultural and non-agricultural soils), human gut, animal gut and wastewater were identified. Cluster analysis performed on b-lactamase gene networks, indicated that samples from the same environment are more tightly connected than samples from different environments (Table S2). This is clearly visible in Fig. 3, where nodes from a given environment (soils, human gut, cow gut and wastewater treatment plant environment) show more connections between them than the number of connections with nodes related to other environments.

Explicar asortividad, clustering coeficient, y algun otro..............

In order to test if sample geography influences on b-lactamase gene content, nodes present in Fig. 3 were presented according their geographic origin (Fig. 4). Clearly, the results obtained here indicated that geography is not correlated with both b-lactamase gene content and diversity.

The BLAST analysis performed on metagenomic reads indicate the presence of b-lactamase genes in different environments,