Introduction
Increasingly, ecological studies can leverage DNA metabarcoding to count
and identify the species present in an environment or a complex mixture
of tissues. Frequent application areas in aquatic ecosystems include
plankton surveys, food web analyses, and forensic identification of
harvested species. Accurate and effective analysis of the DNA content in
these sample types depends on a series of critical methodological
decisions, foremost of which is the choice of barcoding primers
(Aizpurua et al., 2018; Alberdi et al., 2019; Alberdi, Aizpurua,
Gilbert, & Bohmann, 2018; Zhang, Zhao, & Yao, 2020; Zinger et al.,
2019). Primer selection influences which taxa can be detected, and the
taxonomic resolution to which they can be identified. Barcoding primers
amplify sections of genes, which have been selected to provide a balance
between having enough divergence to distinguish species and being
conservative enough to allow amplification across major taxonomic
groups. So-called universal primers, which rely on highly conserved
nucleotide binding sites, are attractive because a single marker can
amplify a wide range of taxa. However, the greater the breadth of taxa
covered (e.g., all metazoa or all teleost fishes), the less likely that
species-level identification will be possible because of a lack of
sequence resolution when priming sites are conserved across divergent
taxonomic groups. Another issue when attempting to obtain species
identification is the failure of markers to amplify due to mismatches
between primers and template sequences. These mismatches can lead to
poor taxon recovery or cause less competitive taxa to drop out if
sequencing depth is insufficient (Aizpurua et al., 2018).
Primer-template mismatches are more common in diverse samples (Elbrecht
et al., 2019); thus, researchers can improve recovery of constituents by
combining universal primers that selectively amplify each focal
taxonomic group with additional primers that offer species resolution
(e.g., Thomsen et al., 2012) or using multiple primers that are
optimized for different taxonomic groups (Aizpurua et al., 2018; Berry
et al., 2017; Carroll et al., 2019; Evans et al., 2016; Jeunen et al.,
2019; Koziol et al., 2019; Silva et al., 2019). Yet, even when using
multiple primers, many studies do not obtain species-level assignments
because of the challenge of balancing taxonomic breadth and resolution
(Djurhuus et al., 2020; Leray & Knowlton, 2015; Locatelli, McIntyre,
Therkildsen, & Baetscher, 2020).
Since the DNA in tissue mixtures of interest is often degraded, primers
that target short DNA fragments, or minibarcodes, may recover a more
complete amplification across taxa in such samples. Smaller barcodes are
more readily amplified than longer fragments and these shorter fragments
are more likely to persist in environmental samples (Shokralla et al.,
2015; Staats et al., 2016) or stomach contents (Devloo-Delva et al.,
2019). Studies that have compared full-length and minibarcodes for
mitochondrial Cytochrome c oxidase I (COI) found that minibarcodes
200-300 bp provide comparable resolution to the full-length 658 bp
barcode (Hajibabaei et al., 2006; Yeo, Srivathsan, & Meier, 2020).
Moreover, full-length barcodes failed to amplify degraded samples
(processed fish products), whereas minibarcodes recovered species-level
sequences (Marín et al., 2018; Yeo, Srivathsan, & Meier, 2020). Short
barcodes are also more economical to sequence than full-length barcoding
genes, as current low-cost, high-throughput sequencing platforms tend to
produce read lengths of ≤ 300 bp. This means that for barcodes shorter
than this length researchers can obtain greater read depth for a given
investment in sequencing, which can be important because greater
sequencing depth potentially detects more rare taxa (Singer, Fahner,
Barnes, McCarthy, & Hajibabaei, 2019; Smith & Peay, 2014).
While initial barcoding efforts for animals primarily leveraged
variation within the COI gene (e.g., Barcode of Life, Ratnasingham &
Hebert, 2007), several other mitochondrial genes have become attractive
alternatives (e.g., Deagle, Jarman, Coissac, Pompanon, & Taberlet,
2014; Machida & Knowlton, 2012; Miya et al., 2015). The popularity of
certain barcoding genes has made extensive high-quality reference data
available via the NCBI and BOLD databases to support taxonomic
assignments. Availability of suitable reference data for particular
taxonomic groups and the accuracy of those data varies among barcoding
genes (Leray, Knowlton, Ho, Nguyen, & Machida, 2019), hence it is a key
factor in choosing primers.
Aquatic habitats – both marine and freshwater – have become popular
targets for metabarcoding studies, likely because of the logistical
challenges and considerable expense associated with traditional sampling
and survey methodologies (e.g., Salter, Joensen, Kristiansen,
Steingrund, & Vestergaard, 2019). A product of these studies are dozens
of primer sets for fishes and aquatic taxa which offer researchers an
abundance of reference data for interpreting metabarcoding results; yet
choosing the optimal primer portfolio also requires assessment of
amplification biases and potential sample degradation. To this end, some
studies evaluate primers in silico and/or in the laboratory, but
comparisons have been largely ad hoc and of limited geographic and
taxonomic extent. Notably, the results of in silico assessments,
which frequently guide primer selection, sometimes differ from those ofin vivo tests (Alberdi et al., 2019; Zhang et al., 2020). The
most comprehensive comparison of eDNA and metabarcoding primers for
fishes to date (Zhang et al., 2020), for example, assessed primers based
exclusively on freshwater fishes from waterbodies in Beijing. Although
such an assessment is beneficial, the results may have limited
application to marine or endemic species outside this region, and
therefore more empirical testing and comparison of the performance and
complementarity of metabarcoding markers is needed.
Despite the proliferation of studies using multiple metabarcoding
markers, few studies have experimentally tested the additive benefit of
a portfolio of markers (each of which amplify a single locus) for
obtaining high resolution (species- or genus-level) taxonomic
assignments (but see Corse et al., 2019). Instead, many studies that
rely on multiple primer sets use each one to identify different
taxonomic groups (Berry et al., 2017) or to balance the trade-off
between sequence identification at a high taxonomic rank and resolution
of taxa within a rank (e.g., Carroll et al., 2019; Djurhuus et al.,
2018). However, even within a single taxonomic group, different primers
pairs may amplify different subsets of species due to polymorphisms in
the primer regions, resulting in complementarity for detection of even
closely related taxa. Further complementarity can be gained from varying
levels of sequence divergence within the amplified targets, which may
result in different markers allowing species-level resolution for
different subsets of taxa. Identification to species-level is often
important, such as when samples may include closely related species that
must be distinguished for biodiversity accounting, fishery and wildlife
management, and species conservation. Accordingly, careful design of
primer portfolios can boost both the detection rates and resolution of
metabarcoding studies, but little empirical testing has explored this
potential.
To assess primer complementarity arising from amplification bias,
reference data, and trade-offs between taxonomic resolution and breadth,
we empirically assess 22 markers, some of which are universal fish
primers and others that are taxon-specific, for their ability to recover
species-level identification from a diverse reference DNA pool of
>100 species of primarily marine and freshwater fishes, but
also including a few representatives of other marine organisms
(elasmobranchs, crustaceans, and cephalopods) to evaluate species
recovery beyond the target taxonomic group. We then explore the utility
of a portfolio approach using complementary markers that amplify
sections of COI, 16S, and 12S genes. Marker performance is assessed
based on the integrated effect of primer specificity and availability
and resolution of reference sequences for the particular taxa in our DNA
mixture, and – in this framework – markers are valuable when they
contribute species identifications for taxa that are not identified by
any other markers. We then test the optimal portfolio from our initial
analysis on a set of different tissue mixtures to assess 1) the tissue
input threshold to ensure detection; 2) how read depth scales with
tissue abundance; and 3) the effect of non-target material in the
mixture on recovery of target taxa (marker performance).
Our study was designed to optimize tools for forensic assessment of
aquaculture feed composition and accordingly, our DNA pools were
composed of aquatic taxa that might be found in fishmeal or other
complex tissue mixtures derived from marine and inland fisheries (Mo,
Man, & Wong, 2018; Tacon & Metian, 2008) and our tissue mixtures were
designed to emulate aquaculture feeds. However, these mock feeds are
very similar in nature to other types of tissue mixtures studied broadly
in ecology, including stomach contents, fecal samples, and plankton
tows. Hence, our overall findings and approach should be transferable to
many applications of metabarcoding analysis of heterogeneous tissues.