A recent paper examining the expected frequency of NUMTS in various marine animals indicated that it is precisely the use of COI Leray that may pose the highest risk of detecting NUMTS in metabarcoding studies (Schultz, Hebert, 2021), as more than 58% of the pseudogenes identified in the study were of lengths up to 300 bp, although that research is more similar in its methodology to that based on PCR free approach or metagenomics (Singer et al., 2020) rather than metabarcoding. It should be emphasized that the results obtained in our work are most likely valid only for metabarcoding, where PCR causes a bias resulting in misrepresentation of the haplotypic diversity of the environmental samples (Tables 3, 4).
The calculations based on the data retrieved from the GeneBank do not allow us to formulate any recommendations for the correction of works to detect genetic diversity using environmental DNA metabarcoding (Figure 2). However, a natural variation in fragment length can quite rarely be expected, which can be used to correct for filtering fragments by length during the computation. The length reduction, at the same time, generally does not entail a decrease in the reliability of the results. The number of population-genetic clusters, which was calculated in this work, is a rather conservative measure, and is not customized for a particular data set with the choice of the exact model. However, it is clear that as one goes from COI barcode to metabarcode, the number of identifiable populations is lost by 1 cluster. For the sets that did not exhibit a decrease in them, one cannot detect any pattern other than the intuitive conclusion that length reduction did not affect them due to the strong divergence, and the random concentration of all information within the metabarcode region.