Metagenome species and the strain level analysis
As it is known that the human health conditions are linked to microbial
communities, phenotypes are often associated with only a subset of
strains within causal microbial groups. Therefore, for WGS metagenome
data we used Metagenome binning with abundance and tetranucleotide
frequencies V.2 (Metabat2)42 , and Metagenomic
Intra-Species Diversity Analysis System
(MIDAS)43 , tools for identifying metagenome
species and strain-level metagenomic classification at default
parameters. De novo assembly for all 12 samples was performed by using
short oligonucleotide analysis package
(SOAP)44 , at K-mer65, followed by binning
using Metabat2 software. Bins greater than 150 genes were selected for
further analysis. Genes which were differentially abundant and had
p<0.05, were considered for visualization.
Species-level coverage was obtained by using MIDAS database across
samples from sojourners visiting different heights. For species with
sufficient coverage, reads were aligned to a pan-genome database of
genes to estimate gene coverage45 , copy
number and presence or absence. The core genome was defined directly
from the data by identifying high-coverage regions (>70%
coverage of the pangenome genes), across multiple metagenomic samples,
providing a comprehensive strain-level genetic overview of the gut
microbial diversity.