Data decontamination
Before assigning taxonomic information to each ASV and collectively to
every sample, sequence data were filtered to reduce the influence of
contamination and errors. First, read count tables were filtered to
exclude ASVs without a taxonomic match in the custom BLAST database.
This removed non-metazoan sequences and those without confident
taxonomic matches (<96% sequence identity in the top hit).
Since we sequenced samples deeply and using a two-step PCR protocol,
which can amplify errors and contaminants, species occupancy detection
modeling (SODM) is an appropriate alternative to using a minimum read
depth threshold for handling potential false detections (Lahoz-Monfort,
Guillera-Arroita, & Tingley, 2016). We used a Bayesian SODM
(Lahoz-Monfort, Guillera-Arroita, & Tingley, 2016) in R to estimate the
probability of ASV occurrence for each sample where each of the 6-9
technical replicates for that sample is treated as an independent draw
from a binomial distribution (true positives and false positives). Any
ASV with an estimated probability of occurrence <80% was
removed from the dataset (as in Djurhuus et al., 2020).
Another way to identify potential contamination is by comparing the
sequence data among extraction and PCR replicates for the same sample.
When replicates are more dissimilar than concordant, this can indicate
anomalies during laboratory activities. To compare the sequence
composition across replicates of each locus for every sample, the ASV
read count data that remained following SODM were analyzed for
dissimilarities based on Bray-Curtis distance and NMDS in the R package
VEGAN (Oksanen et al., 2019). Sample replicates with Bray-Curtis
dissimilarity >0.49 were removed (as in Djurhuus et al.
2020; SI Table S2) and three markers (Teleo, Crust2, and 16Sfish) with
insufficient data at this stage were dropped from further analyses;
however, these markers could have potentially provided useful data had
they received sufficient read depth.