Locus complementarity and taxonomic resolution
Species-level identification in the DNA pools proved more challenging
than either genus or family identification, even though all taxa were
represented by one or more of the reference databases for the target
genes (Fig. S4). Species resolution assignment is more desirable, but
also more challenging to obtain because barcoding genes often include
insufficient variation to confidently distinguish congeneric species.
The challenge of amplifying taxonomically informative variation is
particularly true for minibarcodes, which capture a smaller section of
larger barcoding genes. Furthermore, reference database information
(i.e., GenBank, BOLD) is less complete at the species-level than for
genera or families (i.e. some databases lack data for individual
species, but a much higher proportion of genera and families are
represented) and databases are less accurate for species- and
genus-level identification (Leray et al., 2019; Locatelli et al., 2020).
For these reasons, biodiversity studies may choose to assign data to
family, class, or order, rather than species in order to capture greater
taxonomic breadth (e.g., Djurhuus et al., 2020; Leray & Knowlton,
2015). Our results affirm that such an approach would accurately detect
100% of families present in our reference DNA pool.
Notably, two of the top-performing markers amplified adjacent,
non-overlapping regions of the COI gene. COI markers benefit from the
most complete reference database of the genes we tested (SI, Fig. S4),
which is consistent with prior studies of fish tissues (Devloo-Delva et
al., 2019). The strategy of including multiple markers for the same gene
has been applied often in plant barcoding as well as for 18S rRNA in
animals (e.g., Coghlan, Shafer, & Freeland, 2020; Machida & Knowlton,
2012). Fewer studies show the added benefit of multiple COI markers (but
see Corse et al., 2019; Shokralla, Hellberg, Handy, King, & Hajibabaei,
2015; Valdez-Moreno et al., 2019). Including two COI minibarcode markers
in our portfolio hedges against the limitations of amplifying degraded
samples while leveraging the robust COI reference data for diverse
marine and freshwater taxa.
Despite the popularity of 12S for metabarcoding marine and freshwater
fishes, and the commensurate abundance of reference data (Miya et al.,
2015; Masaki Miya, Gotoh, & Sado, 2020), our top 12S primer set
identified fewer reference taxa than the top COI and 16S markers.
However, the 12S locus contributed more unique species-level
identifications that were not recovered by other genes (Fig. 2), hinting
at the overall utility of this region for barcoding fish to the species
level. Coupled with results from Zhang et al. (2020), in which 12S
markers identified the largest number of fishes from waterbodies in
Beijing, our results reinforce the expectation that optimal markers may
differ across habitats and taxonomic groups, even within fishes.
Markers targeting specific taxonomic groups – sharks, plankton,
crustaceans, and cephalopods – provided no additional resolution for
reference taxa in the DNA pools (because our representatives from these
taxa were detected with our top-performing teleost fish primers).
Surprisingly, COI markers designed for sharks and plankton performed
nearly as well on teleost fish as the best universal fish COI markers.
However, the opposite was true for crustacean and cephalopod markers,
which had little utility outside their targeted taxonomic groups.
Admittedly, we had few representatives of these groups in the DNA pools
to test the potential increased resolution of taxon-specific markers, so
our results are not conclusive, but suggest that markers can show high
performance outside their immediate target group (Fig. 2, 3).
Both 18S markers included a higher proportion of false positives and
contamination in the extraction blanks and PCR negatives than other gene
regions, possibly due to a mismatch between the resolution of the 18S
barcoding region and the species composition of the DNA pools (e.g., 18S
may be better for identifying diverse groups to class or order and
consequently picks up more bacterial contamination; SI, Fig. S4). A
similar explanation – non-specific amplification – may account for the
limited number of target taxa amplified by the lone 28S locus.
Interestingly, a prior study noted that non-specific amplification in
COI markers impaired eDNA analyses for marine and freshwater fishes
(Collins et al., 2019); yet this study did not test either of our
top-performing COI markers, illustrating both the impressive number of
universal fish COI markers and that non-specific amplification resulting
in false detections can vary among markers within a single barcoding
gene and for different applications, i.e., tissue mixture metabarcoding
or eDNA.
Unfortunately, three markers that have amplified well in other studies
(e.g., Polanco et al., 2021; Pont et al., 2018) got so few sequencing
reads that we were unable to retain them in our analysis. The three
markers that dropped out were also those that, based on preliminary data
(agarose gel bands), we chose to amplify in multiplex PCR reactions
(paired with one additional marker, in each case). However, for each of
the three multiplexes, one marker performed well, and one did not. Thus,
our exploration of multiplex reactions revealed challenges that would
require taking amplified products through to sequencing in order to
confirm that both markers receive a comparable number of reads (De Barba
et al., 2014). Despite the validation steps necessary for effective
multiplexing, doing so with complementary markers that amplify different
barcoding genes could ultimately yield a more efficient laboratory
workflow.
Taken together, our results underscore the advantages of using an
optimized portfolio of barcoding markers (similar to results described
by Shaw et al., 2016; Zhang, Zhao, & Yao, 2020), yet also reveal that
adding markers to a portfolio without testing for complementarity can
increase project costs and laboratory effort without improving detection
or identification. Further, additional markers can increase the number
of false positive observations – by nontarget amplification or
mismatches with reference data – and these issues can be more acute
when researchers seek high-resolution species identification from broad
biodiversity surveys. For studies aiming to quantify biodiversity based
on sequence variation patterns, researchers should also be aware of
potential nontarget amplification of nuclear mitochondrial pseudo-genes
(numts), and can use available software (i.e., metaMATE , Andújar
et al., 2021) to remove these sources of error.