Keywords
Environmental genomics, Biomonitoring 2.0, DNA barcode, High-throughput DNA sequencing, Neotropical fish,
DNA metabarcoding has been widely used to access and monitor species. However, several challenges remain open for its mainstream application in ecological studies, particularly when dealing with a quantitative approach. In a from the Cover article in this issue of Molecular Ecology, Cédric et al. (2021) report species-level ichthyoplankton dynamics for 97 fish species from two Amazon river basins using a clever quantitative metabarcoding approach employing a probe capture method. They clearly show that most species spawned during the rainy season when the floods started, but interestingly, species from the same genus reproduced in distinct periods (i.e., inverse phenology). Opportunistically, Cédric et al. (2021) reported that during an intense hydrological anomaly, several species had a sharp reduction in spawning activity, demonstrating a quick response to environmental cues. This is an interesting result since the speed at which fish species can react to environmental changes, during the spawning period, is largely unknown. Thus, this study brings remarkable insights into basic life history information that is imperative for proposing strategies that could lead to a realistic framework for sustainable fisheries management practices and conservation, fundamental for an under-studied and threatened realm, such as the Amazon River basin .
The use of novel molecular tools to access and monitor biodiversity has been revolutionized by the introduction of high-throughput sequencing (HTS) platforms that allow the analysis not only of individual specimens but of whole communities (Leese et al. 2018). The use of HTS coupled with DNA barcoding enables the identification of multiple species from environmental bulk samples, commonly termed DNA metabarcoding. Such DNA based next generation biomonitoring, or “Biomonitoring 2.0” (Baird and Hajibabaei, 2012), has a tremendous potential to advance the traditional protocols applied to assess and monitor the environment (Leese et al. 2018), ensuring a rapid and cost-efficient biodiversity assessment. This is particularly important when dealing with a large number of organisms, as is common with ichthyoplankton samples.
Despite the large potential of DNA metabarcoding for analyzing bulk samples, technical issues that may hinder full mainstream Biomonitoring 2.0 practices are still under debate, particularly regarding quantitative analysis (Lamb et al. 2018, Deagle et al. 2018). The potential quantitative value of DNA metabarcoding has been questioned, mainly due to the difficulty of obtaining sound and unbiased data. However, it is controversial whether or not the proportions of reads generated from DNA metabarcoding studies correspond to the real proportions of species in a community (Lamb et al. 2018). Constraints affecting the quantitative performance of metabarcoding are due to bias introduced during the four main stages of its workflow: (1) sample collection and processing, (2) choice of target gene, (3) HTS DNA sequencing and (4) bioinformatics (Figure 1). Numerous approaches have been implemented to mitigate bias during each stage in order to avoid constraints that hinder accurate quali-quantitative DNA metabarcoding analysis (Thomas et al. 2016, Deagle et al. 2018; Ratcliffe et al. 2021).
Mariac et al. (2021) addresses well the bias that may arise from ichthyoplankton sampling and the unequal body size of fish larvae that may lead to differences in cell and DNA quantity from each species/specimen per bulk sample. Since, when dealing with ichthyoplankton, it is possible to estimate the number of larvae per second drifting through the river using the total larval flow (TLF), so the authors applied TFL as a correction factor for species abundance. Also, the authors were able to size fractionated specimens and extract DNA from individuals of similar size to minimize bias due to different initial amounts of DNA from each larva. Similarly, Ratcliffe et al. (2021) used a standardized amount of tissue for each species to improve quantification accuracy. But instead, they used PCR and conserved primer binding sites to amplify the mitochondrial 12S gene and avoid bias due to differential amplification rates. Further adjustment of read counts (i.e. relative read abundance estimates - RRA) implemented in their bioinformatics pipeline was capable of generating more accurate DNA metabarcoding quantitative estimations. Thomas et al. (2016), also using a PCR DNA metabarcoding approach, dealt with read abundance biases by applying the relative correction factor (RCF) obtained from HTS sequencing of 50/50 mixtures of target and control species. By applying the RCF, they were able to correct the majority of species-specific biases from control and field samples and improve the relationship between RRA and mass percentage of each taxon.
Most DNA metabarcoding bias may be previously diagnosed and adjusted using mock communities composed of known species composition. The analyses of such mock samples are paramount to validate DNA metabarcoding accuracy to detect and quantify species, and therefore, are recommended to be conducted prior to analyzing bulk samples of unknown species composition obtained from field samples (Duke & Burton, 2020). Following that, before analyzing ichthyoplankton sampled from the environment, Mariac et al. (2018) previously demonstrated that metabarcoding by capture using a single COI probe was able to identify and quantify fish species in ichthyoplankton swarms from the Amazon realm using mock communities.
Nonetheless, powerful ecological inferences from metabarcoding studies are strongly dependent on reference DNA databases that may allow the identification of reads to the lowest taxonomic level possible. While the COI gene database is one of the most complete for molecular assignment of fish species, it is still incomplete for several groups, particularly in high biodiversity realms such as the Amazon. Moreover, the use of multiple markers could increase sensitivity of species detection due to mismatches between probes/primers (Duke & Burton, 2020). With the decrease of DNA sequencing costs, a possible good cost/benefit solution would be ”genome skimming” of voucher tissue samples and assembling whole mitogenomes for the development of a broader reference database. Such a mitogenome database would allow the analysis of multiple markers and primers development for metabarcoding communities from specific river basins (Milan et al. 2020). However, sequencing several mitogenomes per species to generate intraspecific diversity, important for detecting highly conserved primer sites, might severely increase costs. Despite this, examples of Neotropical fish species with multiple mitogenomes are already available (Santos et al. 2021).
In perspective, DNA metabarcoding (PCR free or not) may allow accurate estimation of ecological information, since there is a great amount of evidence that adjustments can overcome known constraints (Figure 1). Thus, with the continuous refinements of DNA metabarcoding methodology, I foresee its mainstream use to monitor and assess biodiversity allowing rapid and relatively inexpensive processing of complex bulk and environmental samples for several taxa and highly diverse realms.