2.4 Sequence and data processing
Raw sequence data from seawater, sediments, skin, and intestinal microbiota were processed with QIIME2 version 2021.11. After visualizing interactive quality plots and checking the reads’ quality, the DADA2 (Callahan et al., 2016) pipeline was applied to demultiplex and merge pair-end reads. Quality control was performed by DADA2 on trimming, sequence error elimination, detection, and removal of chimaeras following these parameters: -p-trim-left-f 8 -p-trunc-len-f 225 -p-trunc-len-r 213. Then, a naïve Bayes classifier was trained following the RESCRIPt pipeline (Robeson II et al., 2020) using the non-redundant SSU reference dataset at 99 % identity of the full SILVA 138 release (Quast et al., 2013) on the specific 16S rRNA V3-V4 amplicon region with the pair of primers stated above. Amplicon sequence variants (ASVs) classified as mitochondria, chloroplasts, archaea, eukaryotes, and unassigned taxon were subsequently excluded. Singletons and ASVs with less than 10 reads across all samples were removed (control of spurious artefacts of the PCR amplification process and/or potential sequencing errors). Samples were then clr -transformed to retain the compositional nature of microbiome datasets (Gloor et al., 2017) for further downstream analysis. Raw sequences have been deposited at the National Centre for Biotechnology Information (NCBI) under the project accession number PRJNA731335.