Using taxonomy to assemble barcodes
Because the maximally abundant sequences from 20-30 % of specimens were not from the expected taxa according to BLAST, we developed a taxonomy-weighted barcode-assembly approach. For each specimen, we considered the most abundant FC and/or BR sequences with the expected taxonomic identifications—prioritising the lowest identifiable taxonomic rank in each case—to be correct and considered merged FC and BR sequences to be correct barcodes only if the contributing sequences had 100 % identity across the expected overlap length. This approach typically identified correct FC and BR sequences among the 20 most abundant sequences per specimen, but in a small number of cases, the correct sequences were identified at abundance ranks between 21 and 101. The ability to examine multiple sequences for correct identity is a key advantage of this process over Sanger sequencing, in which only a single sequence per specimen can typically be examined. This simple yet effective filtering approach greatly enhanced successful barcode recovery and provided evidence against relying solely on sequence abundances to select barcode sequences. Directly leveraginga-priori taxonomic data from validated specimens allowed accurate identification of non-target contaminant sequences, further stressing the value that taxonomically validated specimens can confer towards barcode generation. Similarly, taxonomic information was considered important for confirming the identity of insect pests detected in bulk trap catches by multi-locus metabarcoding [19].