Metagenomic assembly and genome reconstruction
We used two different strategies for metagenomic assembly and genomic
binning of the eighteen metagenomic datasets from Deception Island
volcano. First, reads were assembled using IDBA-ud (Peng et al., 2012)
(-mink 50, -maxk 92, -tep 4, -min_contig 1000) and then genomic binning
was performed through MaxBin 2.0 (Wu et al., 2016). Contigs were
annotated using the Integrated Microbial Genomes & Microbiomes (IMG/M)
system (Markowitz et al., 2009).
Furthermore, reads were co-assembled using MEGAHIT v. 1.0.2. (Li et al.,
2015), discarding contigs smaller than 1000 bp. Then contigs were binned
using anvi’o v. 5 following the workflow described by Eren et al.
(2015). Reads for each metagenome were mapped to the co-assembly using
bowtie2 with default parameters (Langmead & Salzberg, 2012). A contig
database was generated using the ‘anvi-gen-contigs-database’. Prodigal
(Hyatt et al., 2010) was used to predict open reading frames (ORFs).
Single-copy bacterial and archaeal genes were identified using HMMER v.
3.1b2 (Finn et al., 2011). The program ‘anvi-run-ncbi-cogs’ was used to
annotate genes with functions by searching for them against the December
2014 release of the Clusters of Orthologous Groups (COGs) database
(Galperin et al., 2015) using blastp v2.10.0+ (Altschul et al., 1990).
Predicted protein sequences were functionally and taxonomically
annotated against KEGG with GhostKOALA (genus_prokaryotes) (Kanehisa et
al., 2016). Individual BAM files were profiled using the program
‘anvi-profile’ with a minimum contig length of 4 kbp. Genome binning was
performed using CONCOCT (Alneberg et al., 2013) through the ‘anvi-merge’
program with default parameters. We used ‘anvi-interactive’ to visualize
the merged data and identify genome bins. Bins were then manually
refined using ‘anvi-refine’, and completeness and contamination were
estimated using ‘anvi-summarize’.
Bins generated by the assembly and co-assembly approaches were quality
checked through CheckM v. 1.0.7 (Parks et al., 2015), which is based on
the representation of lineage-specific marker gene sets. Bins were
taxonomically classified based on genome phylogeny using GTDB-Tk
(Chaumeil et al., 2020).
Taxonomic and functional
annotation of metagenome-assembled genomes
(MAGs)
Bins were defined as a high-quality draft (>90% complete,
<5% contamination), medium-quality draft (>50%
complete, <10% contamination) or low-quality draft
(<50% complete, <10% contamination) metagenome
assembled-genome (MAG), according to genome quality standards suggested
by (Bowers et al., 2017). We selected 11 MAGs based on their medium or
high-quality and taxonomy, preferably selecting groups related to
extremophiles or associated to sulfur and nitrogen metabolisms.
Annotation of all predicted ORFs in MAGs was performed using prokka
v.14.5 (Seemann, 2014). Further, proteins were compared to sequences in
the KEGG Database through GhostKOALA (genus_prokaryotes) (Kanehisa et
al., 2016) and in the SEED Subsystem through RASTtk (Brettin et al.,
2015). Phenotypes were predicted using the PICA framework (Feldbauer et
al., 2015) and PhenDB
(https://phendb.csb.univie.ac.at/).