Importance of regional reference databases
Given that increasing reference database completeness increased the ability to assign ASV’s to species, it is logical to assume that databases with more taxonomic coverage are better. However, our results suggest an unexpected trade-off between greater diversity of barcodes and regionally/ecologically informed taxonomic assignment. For example, using only the FishCARD database, which is specific to California Current marine fishes, we identified important native taxa like Black croaker (Cheilotrema saturnum ) and Bat ray (Myliobatis californica ) in eDNA samples. However, when FishCard and theCRUX -12S databases were combined to yield a database with the largest total number of barcodes, black croaker was not identified and bat ray inconsistently identified across multiple ASVs. The combined database failed to identify black croaker due to the high similarity of12S barcode sequences within the Family Sciaenidae, specifically within the clade that includes Cheilotrema, a genus native to California, as well as Equetus and Pareques, non-native coral reef associated genera; Supplemental Table 3). Similarity of barcode sequences also explains the loss of taxonomic resolution inMyliobatis .
By excluding highly similar non-native barcodes, the curated FishCARD database provided more accurate species-level assignments, suggesting that a database comprised of only local taxa is preferred to maximize identification of local species. However, this improvement was not universal. For example, FishCARD failed to classify an ASV belonging to the family Delphinidae that was identified by both the CRUX and combined databases. This result stems from FishCARD being specific to California Current fishes and does not include marine mammals. This shortcoming could be easily overcome, however, by appending FishCARD with barcodes for other marine-associated vertebrate taxa of local management interests (Valsecchi et al., 2019).
These results highlight the tradeoff between identifying local species from clades with little genetic variation and providing taxonomic coverage across a broad range of vertebrate species. As such, researchers need to identify their research priorities when deciding on which reference databases to use, with a particular focus on defining the scope of the target taxa. Future work could alleviate this tradeoff by building bioinformatic pipelines that prioritize assignments to a reference set of native species, perhaps by including information on species ranges and sample locations in the assignment algorithm. Alternatively, a regional database could be appended to address specific questions, such as testing for the presence of specific invasive species or range shifts associated with climate change.