3 | Results

3.1 | 16S rRNA sequencing and pre-processing

Excluding the controls, a total of 1,374,973 reads (mean of 122,640 per sample) were sequenced. Quality filtering, denoising, merging and removing chimeric sequences resulted in 36% of reads being discarded (Table S1). Discarding sequences found in the blanks (Table S2) reduced read count by 2.1% and resulted in the loss of 12 ASVs. Removing non-bacterial sequences and those either assigned to chloroplast or unidentified at kingdom level resulted in the loss of 8.5% of the reads. Discarding ASVs with less than 10 reads further decreased the number of ASVs and reads by 44.8% and 3.8%, respectively. Final reads count per sample averaged 65,966 (standard deviation [sd] = 21,526), with ASV richness reaching a plateau for all samples (Fig. S1).

3.2 | Metagenomics sequencing and pre-processing

A total of 314,003,232 reads with a mean of 31,400,323 per sample were obtained from the ½ Novaseq SP flow cell (Table S3). Trimming low quality reads reduced their number by 19.3% and removing reads associated to phix and human DNA by 0.6%, and to salmon DNA by another 0.6% (Table S3).
On average, 4,425.6 gene families per sample matched the ChocoPhLAn database after nucleotide alignment with Bowtie2 (Table S4). Translating reads with diamond and using the uniref50 database, an average of 36% of reads could be aligned, with a mean of 1,084,334 gene families identified per sample (Table S4).

3.3 | Metabarcoding versus metagenomics-based functional profiling

In both the 16S rRNA metabarcoding and metagenomics dataset, the two main bacterial Phyla were Proteobacteria and Bacteroidetes (Fig. 2A). The Proteobacteria families Desulfobacteraceae, Psychromonadaceae and Vibrionaceae, and the Bacteroidetes family Flavobacteriaceae were the most abundant taxa. The total number of bacterial families detected was substantially higher for metabarcoding (224) compared to metagenomics (15; Fig. 2B). Subsequently, the mean number of families detected per sample was also considerably higher for metabarcoding (114) versus metagenomics (4.3) (Fig. 2B). Conversely, the overall and mean number of functions per sample, either ECs or KOs, were relatively similar between HSP methods and metagenomics (Fig. 2C).
Differences in the detection of functions (ECs and KOs) by HSP methods and metagenomics were assessed with confusion matrices (Fig. 3). The results indicate that Paprica had the highest sensitivity (highest true positives rate; 0.76), followed by Picrust2 (0.66) and Tax4Fun2 (0.52). Conversely, Paprica showed lowest specificity (lowest true negative rate) of 0.6, while Picrust2 and Tax4Fun2 showed similar results with 0.78 and 0.79, respectively.
Overall, correlations of the abundance of HSP-derived functions with metagenomics was low, with a mean value of 0.07, 0.2 and 0.16 for Paprica, Picrust2 and Tax4Fun2, respectively, and hardly differed from the null expectation datasets (Fig. 4). At the sample level, Spearman correlations of the HSP datasets with metagenomics was highest for Picrust2 (0.81), followed by Tax4Fun2 (0.67) and Paprica (0.6), but no significant difference with null expectation datasets was observed (Fig. 4 and Table S5).
The correspondence of the ASV (16S rRNA metabarcoding) community with the functional community derived from metagenomics was relatively weak using either centered-log ratio transformation (r = 0.49, p.value = 0.21) or presence/absence (r = 0.4, p.value = 0.44) data (Fig. 5). This contrasts with the strong and significant correspondence of the HSP-derived functional communities with metagenomics, with Paprica and Tax4Fun2 showing highest correlation (r ≥ 0.87; Fig. 5). Correspondence of these two HSP methods with metagenomics noticeably improved when using presence/absence data and Jaccard dissimilarity indices (r = 0.92 and 0.04, respectively; Fig. 5).

3.4 | Sensitivity of 16S rRNA metabarcoding and metagenomics

The response of the taxonomic (ASVs) and functional communities toward fish farm activities was tested for each dataset with a PERMANOVA, by comparing the variance between samples collected at the pen (0 m) versus those collected at reference sites (≥ 1200 m). Except for Picrust2, the data transformed to the centered-log ratio showed significant differences between near and far-field samples. In general, highest sensitivity was achieved when using presence/absence data and Jaccard dissimilarity indices, with Paprica and Tax4Fun2 being the most responsive (R2 = 0.39 and 0.4, respectively; Table 1). While betadisper analysis of homogeneity of groups dispersions showed higher variance in samples collected at the pen, no significant difference was detected between distance groups (Fig. S2 and S3, Table S6).
Correlation of the taxonomic (ASVs) and functional communities with macrofaunal communities and physico-chemical data were explored with Protest analyses. While the taxonomic profile did not significantly correlate with the macrofauna, all functional communities were strongly and significantly correlated (r ≥ 0.68), with the strongest associations with Paprica, Tax4Fun2 and metagenomics data (Table 2). Conversely, only the EC-based datasets (Paprica and Humann2 EC) were significantly associated with the physico-chemical data (r ≥ 0.96, p.value ≤ 0.048), although correlations were relatively strong with all molecular datasets (r ≥ 0.5; Table 2).
Pathways of particular interest involved in the nitrogen and sulfur cycle were compared between pen and reference sites and between the metagenomics (Humann2) and 16S rRNA HSP-based data (Fig. 6). Only four out of the nine pathways investigated were found to be affected by fish farm activities within the metagenomics and Picrust2 datasets. These included nitrate-reduction (increased prevalence; +), allantoin-degradation (reduced prevalence; -), sulfur-oxidation (-) and glycosaminoglycan-degradation (+). In comparison, Paprica identified four additional pathways affected by fish farming including nitrogen-fixation (+), sulfur-reduction (-), sulfite and sulfide-reduction (-), and dimethylsulfide-degradation (+). All pathways found to be either positively or negatively affected by fish farm activities in the metagenomes were found to respond similarly in both Paprica and Picrust2 datasets. Since we expected sulfur related pathways of both metagenomics and 16S rRNA HSP-based data to be more strongly affected by fish farm activities, we also assessed functional groups determined by the Faprotax methodology (based on the taxonomic identification of the ASVs). Using Faprotax, functional groups more prominent near the pens included ureolysis, dark hydrogen oxidation, sulfite respiration and nitrogen fixation, while those more abundant at the reference sites included aerobic ammonia oxidation and oxygenic photoautotrophy (Fig. S4).