3 | Results
3.1 | 16S rRNA sequencing and
pre-processing
Excluding the controls, a total of 1,374,973 reads (mean of 122,640 per
sample) were sequenced. Quality filtering, denoising, merging and
removing chimeric sequences resulted in 36% of reads being discarded
(Table S1). Discarding sequences found in the blanks (Table S2) reduced
read count by 2.1% and resulted in the loss of 12 ASVs. Removing
non-bacterial sequences and those either assigned to chloroplast or
unidentified at kingdom level resulted in the loss of 8.5% of the
reads. Discarding ASVs with less than 10 reads further decreased the
number of ASVs and reads by 44.8% and 3.8%, respectively. Final reads
count per sample averaged 65,966 (standard deviation [sd] = 21,526),
with ASV richness reaching a plateau for all samples (Fig. S1).
3.2 | Metagenomics sequencing and
pre-processing
A total of 314,003,232 reads with a mean of 31,400,323 per sample were
obtained from the ½ Novaseq SP flow cell (Table S3). Trimming low
quality reads reduced their number by 19.3% and removing reads
associated to phix and human DNA by 0.6%, and to salmon DNA by another
0.6% (Table S3).
On average, 4,425.6 gene families per sample matched the ChocoPhLAn
database after nucleotide alignment with Bowtie2 (Table S4).
Translating reads with diamond and using the uniref50 database, an
average of 36% of reads could be aligned, with a mean of 1,084,334 gene
families identified per sample (Table S4).
3.3 | Metabarcoding versus metagenomics-based
functional
profiling
In both the 16S rRNA metabarcoding and metagenomics dataset, the two
main bacterial Phyla were Proteobacteria and Bacteroidetes (Fig. 2A).
The Proteobacteria families Desulfobacteraceae, Psychromonadaceae and
Vibrionaceae, and the Bacteroidetes family Flavobacteriaceae were the
most abundant taxa. The total number of bacterial families detected was
substantially higher for metabarcoding (224) compared to metagenomics
(15; Fig. 2B). Subsequently, the mean number of families detected per
sample was also considerably higher for metabarcoding (114) versus
metagenomics (4.3) (Fig. 2B). Conversely, the overall and mean number of
functions per sample, either ECs or KOs, were relatively similar between
HSP methods and metagenomics (Fig. 2C).
Differences in the detection of functions (ECs and KOs) by HSP methods
and metagenomics were assessed with confusion matrices (Fig. 3). The
results indicate that Paprica had the highest sensitivity
(highest true positives rate; 0.76), followed by
Picrust2 (0.66) and Tax4Fun2 (0.52).
Conversely, Paprica showed lowest specificity (lowest true
negative rate) of 0.6, while Picrust2 and
Tax4Fun2 showed similar results with 0.78 and 0.79,
respectively.
Overall, correlations of the abundance of HSP-derived functions with
metagenomics was low, with a mean value of 0.07, 0.2 and 0.16 for
Paprica, Picrust2 and Tax4Fun2,
respectively, and hardly differed from the null expectation datasets
(Fig. 4). At the sample level, Spearman correlations of the HSP datasets
with metagenomics was highest for Picrust2 (0.81),
followed by Tax4Fun2 (0.67) and Paprica (0.6), but no
significant difference with null expectation datasets was observed (Fig.
4 and Table S5).
The correspondence of the ASV (16S rRNA metabarcoding) community with
the functional community derived from metagenomics was relatively weak
using either centered-log ratio transformation (r = 0.49, p.value =
0.21) or presence/absence (r = 0.4, p.value = 0.44) data (Fig. 5). This
contrasts with the strong and significant correspondence of the
HSP-derived functional communities with metagenomics, with
Paprica and Tax4Fun2 showing highest correlation (r ≥
0.87; Fig. 5). Correspondence of these two HSP methods with metagenomics
noticeably improved when using presence/absence data and Jaccard
dissimilarity indices (r = 0.92 and 0.04, respectively; Fig. 5).
3.4 | Sensitivity of 16S rRNA metabarcoding and
metagenomics
The response of the taxonomic (ASVs) and functional communities toward
fish farm activities was tested for each dataset with a PERMANOVA, by
comparing the variance between samples collected at the pen (0 m) versus
those collected at reference sites (≥ 1200 m). Except for
Picrust2, the data transformed to the centered-log
ratio showed significant differences between near and far-field samples.
In general, highest sensitivity was achieved when using presence/absence
data and Jaccard dissimilarity indices, with Paprica and
Tax4Fun2 being the most responsive (R2 = 0.39
and 0.4, respectively; Table 1). While betadisper analysis of
homogeneity of groups dispersions showed higher variance in samples
collected at the pen, no significant difference was detected between
distance groups (Fig. S2 and S3, Table S6).
Correlation of the taxonomic (ASVs) and functional communities with
macrofaunal communities and physico-chemical data were explored with
Protest analyses. While the taxonomic profile did not significantly
correlate with the macrofauna, all functional communities were strongly
and significantly correlated (r ≥ 0.68), with the strongest associations
with Paprica, Tax4Fun2 and metagenomics data (Table
2). Conversely, only the EC-based datasets (Paprica and
Humann2 EC) were significantly associated with the
physico-chemical data (r ≥ 0.96, p.value ≤ 0.048), although correlations
were relatively strong with all molecular datasets (r ≥ 0.5; Table 2).
Pathways of particular interest involved in the nitrogen and sulfur
cycle were compared between pen and reference sites and between the
metagenomics (Humann2) and 16S rRNA HSP-based data
(Fig. 6). Only four out of the nine pathways investigated were found to
be affected by fish farm activities within the metagenomics and
Picrust2 datasets. These included nitrate-reduction
(increased prevalence; +), allantoin-degradation (reduced prevalence;
-), sulfur-oxidation (-) and glycosaminoglycan-degradation (+). In
comparison, Paprica identified four additional pathways
affected by fish farming including nitrogen-fixation (+),
sulfur-reduction (-), sulfite and sulfide-reduction (-), and
dimethylsulfide-degradation (+). All pathways found to be either
positively or negatively affected by fish farm activities in the
metagenomes were found to respond similarly in both Paprica and
Picrust2 datasets. Since we expected sulfur related
pathways of both metagenomics and 16S rRNA HSP-based data to be more
strongly affected by fish farm activities, we also assessed functional
groups determined by the Faprotax methodology (based on the
taxonomic identification of the ASVs). Using Faprotax,
functional groups more prominent near the pens included ureolysis, dark
hydrogen oxidation, sulfite respiration and nitrogen fixation, while
those more abundant at the reference sites included aerobic ammonia
oxidation and oxygenic photoautotrophy (Fig. S4).