2.4 Sequence and data processing
Raw sequence data from seawater, sediments, skin, and intestinal
microbiota were processed with QIIME2 version 2021.11. After visualizing
interactive quality plots and checking the reads’ quality, the DADA2
(Callahan et al., 2016) pipeline was applied to demultiplex and merge
pair-end reads. Quality control was performed by DADA2 on trimming,
sequence error elimination, detection, and removal of chimaeras
following these parameters: -p-trim-left-f 8 -p-trunc-len-f 225
-p-trunc-len-r 213. Then, a naïve Bayes classifier was trained following
the RESCRIPt pipeline (Robeson II et al., 2020) using the non-redundant
SSU reference dataset at 99 % identity of the full SILVA 138 release
(Quast et al., 2013) on the specific 16S rRNA V3-V4 amplicon region with
the pair of primers stated above. Amplicon sequence variants (ASVs)
classified as mitochondria, chloroplasts, archaea, eukaryotes, and
unassigned taxon were subsequently excluded. Singletons and ASVs with
less than 10 reads across all samples were removed (control of spurious
artefacts of the PCR amplification process and/or potential sequencing
errors). Samples were then clr -transformed to retain the
compositional nature of microbiome datasets (Gloor et al., 2017) for
further downstream analysis. Raw sequences have been deposited at the
National Centre for Biotechnology Information (NCBI) under the project
accession number PRJNA731335.