Bioinformatics and Taxonomic Assignment
We processed DNA sequences using the Anacapa Toolkit (Curd et
al., 2019) following default parameters with a Bayesian cutoff score of
60. We then assigned taxonomy to each generated amplicon sequence
variant (ASV), unique sequence generated through metabarcoding, using
three different reference databases. First, we created a 12Sreference database using CRUX (Curd et al., 2019), which compiled
all publicly available matching 12S barcode sequences from the
NCBI GenBank database targeted by the MiFish Universal Teleost primers,
employing standard CRUX parameters (Benson et al., 2018; Curd et
al., 2019). This set of sequences is herein referred to as the
“CRUX -12S database” and included any GenBank reference
barcode that in silico amplified to the MiFish 12S primers
(sequences downloaded in October 2019;
https://github.com/zjgold/FishCARD & datadryad.org link provided upon
acceptance). Second, to evaluate how increasing database coverage
improves taxonomic assignments, we supplemented the CRUX-12Sdatabase with the 757 additional California Current fish 12Sbarcodes generated for this study, herein referred to as the “combined
database”. Third, to test the value of a database curated for the
region, we created a reference database comprised of only 12Sbarcodes of fishes native to California Current Large Marine Ecosystem.
These sequences included those obtained from GenBank via CRUX and
the 757 newly generated reference sequences. This regionally specific
reference database is subsequently referred to as “FishCARD”.