Detection success and multiple primer set complementarity
To allow fair comparison, we subsampled reads to improve evenness of
coverage across markers and removed three markers with insufficient data
(<1,000 reads or <0.1% of the mean number of reads
per locus) from further analyses (Teleo, Crust2, and 16Sfish). One
additional primer, 16Svar, had low yield (<10,000 reads,
<1% of the mean), but included sufficient data for data
decontamination so was retained for analysis. Out of the 19 retained
markers, a combination of four identified all 60 families of marine and
freshwater taxa in the full reference DNA pool. These four markers,
FishminiA, nsCOIFo, MiFish, and CEP (for details, see Table 1) provided
sufficient taxonomic resolution to correctly recover the genus of 90.9%
of taxa and identify to species 58.6% of input taxa (Fig. 1). All but
one of the of the species in our reference pool (Petrochromis
kazumbe ) had reference data for at least one of the four markers, so
the frequent lack of species-level detection resulted from insufficient
sequence variation within the amplified target rather than database
representation. Two additional markers (aquaF2 and either aquaF3 or
shark474) allowed recovery to genus-level of three more reference taxa
(83 of 88 genera; 94.3%) and adding two additional markers (aquaF2 and
Fish_COILBC) identified two of the remaining known taxa to species
level, but the remaining 13 markers did not. Genus-level assignments
were more successful than species-level assignment because BLAST hits to
multiple unique species within the top 2% of hits were aggregated to
the genus-level.
Marker performance was broadly consistent across taxonomic levels
(species, genus, family), with COI markers generally performing better
than other barcoding genes. This success was at least partially
attributable to the more extensive coverage of our focal taxa in the
reference database for COI (SI, Fig. S5). The two top performing markers
target adjacent but non-overlapping regions of the COI gene, and of
these, the single best marker identified 90% of reference taxa to
family, 78.4% to genus, and 41.7% to species (Fig. 2). In combination,
these two COI markers identified 95% of families, 85% of them to genus
level, and just under 50% to species-level. The best 16S and 12S
markers recovered fewer taxa to species-level (~25%
each), but contributed taxa not identified by any other primer,
supporting the value of the portfolio (Fig. 2).
For taxon-specific markers, the COI markers for elasmobranchs and
plankton identified nearly as many reference taxa (the majority of which
were teleosts) as the top-performing COI marker, and with similar
taxonomic resolution, and thus were of more general use. However,
crustacean and cephalopod markers had limited use outside of these
targeted groups (Fig. 2). In contrast, the more general fish markers
successfully identified the few representative elasmobranch, crustacean,
and cephalopod samples included in our DNA pool, suggesting broader
taxonomic reach of those primers.