Bioinformatics and Taxonomic Assignment
We processed DNA sequences using the Anacapa Toolkit (Curd et al., 2019) following default parameters with a Bayesian cutoff score of 60. We then assigned taxonomy to each generated amplicon sequence variant (ASV), unique sequence generated through metabarcoding, using three different reference databases. First, we created a 12Sreference database using CRUX (Curd et al., 2019), which compiled all publicly available matching 12S barcode sequences from the NCBI GenBank database targeted by the MiFish Universal Teleost primers, employing standard CRUX parameters (Benson et al., 2018; Curd et al., 2019). This set of sequences is herein referred to as the “CRUX -12S database” and included any GenBank reference barcode that in silico amplified to the MiFish 12S primers (sequences downloaded in October 2019; https://github.com/zjgold/FishCARD & datadryad.org link provided upon acceptance). Second, to evaluate how increasing database coverage improves taxonomic assignments, we supplemented the CRUX-12Sdatabase with the 757 additional California Current fish 12Sbarcodes generated for this study, herein referred to as the “combined database”. Third, to test the value of a database curated for the region, we created a reference database comprised of only 12Sbarcodes of fishes native to California Current Large Marine Ecosystem. These sequences included those obtained from GenBank via CRUX and the 757 newly generated reference sequences. This regionally specific reference database is subsequently referred to as “FishCARD”.