Data decontamination
Before assigning taxonomic information to each ASV and collectively to every sample, sequence data were filtered to reduce the influence of contamination and errors. First, read count tables were filtered to exclude ASVs without a taxonomic match in the custom BLAST database. This removed non-metazoan sequences and those without confident taxonomic matches (<96% sequence identity in the top hit). Since we sequenced samples deeply and using a two-step PCR protocol, which can amplify errors and contaminants, species occupancy detection modeling (SODM) is an appropriate alternative to using a minimum read depth threshold for handling potential false detections (Lahoz-Monfort, Guillera-Arroita, & Tingley, 2016). We used a Bayesian SODM (Lahoz-Monfort, Guillera-Arroita, & Tingley, 2016) in R to estimate the probability of ASV occurrence for each sample where each of the 6-9 technical replicates for that sample is treated as an independent draw from a binomial distribution (true positives and false positives). Any ASV with an estimated probability of occurrence <80% was removed from the dataset (as in Djurhuus et al., 2020).
Another way to identify potential contamination is by comparing the sequence data among extraction and PCR replicates for the same sample. When replicates are more dissimilar than concordant, this can indicate anomalies during laboratory activities. To compare the sequence composition across replicates of each locus for every sample, the ASV read count data that remained following SODM were analyzed for dissimilarities based on Bray-Curtis distance and NMDS in the R package VEGAN (Oksanen et al., 2019). Sample replicates with Bray-Curtis dissimilarity >0.49 were removed (as in Djurhuus et al. 2020; SI Table S2) and three markers (Teleo, Crust2, and 16Sfish) with insufficient data at this stage were dropped from further analyses; however, these markers could have potentially provided useful data had they received sufficient read depth.