Introduction
Since DNA barcoding was formally proposed on a large scale , cox1sequences have been rapidly accumulated from all around the world
(Porter & Hajibabaei, 2018). Early studies mostly had a narrow
systematic focus and targeted local or regional species assemblages.
With emerging global comprehensiveness from the global iBOL project
(International Barcode of Life), researchers became aware of the
problems that arise with the use of cox1 (i.e., mitochondrial
DNA) as taxonomic marker (Funk & Omland 2003; Ballard & Whitlock,
2004; Dasmahapatra & Mallet, 2006; Dupuis et al., 2012; Nicholls et
al., 2012; Smith et al., 2012; Dowton et al., 2014; Ross, 2014; Eberle
et al., 2019), but also the effects of geographic scale on accuracy and
performance of barcoding (Lohse, 2009; Bergsten et al., 2012; Gaytán et
al., 2020). Therefore, geographic sampling has been a central debate
(Lim et al., 2012; Reid & Carstens, 2012; Talavera et al., 2013; Ahrens
et al., 2016), in particular with respect to DNA barcoding, one of the
major tools of DNA taxonomy.
In order to infer in more detail the nature of the genetic markers used
for taxonomy, and also to investigate further the empirical behaviour of
species delimitation approaches currently in use, it would be desirable
to test commonly used methods on a dataset without geographic bias that
still provides a sufficient number of closely related taxa. Here, we
focus on cox1 , since it continues and will continue to be a
widely used marker for taxonomy in barcoding and metabarcoding studies.
So far, most comprehensive barcoding efforts have been made in
”northern” and widely temperate countries (e.g., Pentinsaari et al.,
2014, 2019; Gwiazdowski et al., 2015, Hendrich et al., 2015; Rougerie et
al., 2015, Hebert et al., 2016, Rulik et al., 2017; Bouchard et al.,
2017, Steinke et al., 2017; see also: ;
https://www.bolgermany.de/wp/startseite/news-publikationen/publikationen/page/2/).
The number of studies in tropical or subtropical areas is comparatively
low or limited to a narrow focal group (e.g., Elias et al., 2007; Janzen
et al., 2009; Janzen & Hallwachs, 2011; Astrin et al., 2012; Ahrens et
al., 2016; Cancian de Araujo et al., 2019), and only few authors
assembled data on the global level (e.g., Zhou et al. 2016).
Interestingly, in regional (i.e., national) level libraries, molecular
operational taxonomic units (MOTUs, i.e., BINs; Ratnasingham & Hebert,
2013) showed perfect matches to known morphospecies in nearly 90% of
the studied species (e.g., Pentinsaari et al., 2014; Hendrich et al.,
2015). Occasionally, mismatch to described species occurred due to
splitting into clusters of different geographic origin (e.g., Morinière
et al., 2017) or sharing of identical or closely related haplotypes
among different morphospecies (e.g., Hawlitschek et al., 2017). However,
matches generally decreased when geographic sampling of species was
wider, e.g., on a continental scale (Bergsten et al., 2012; Schmid-Egger
et al., 2018, Mutanen et al., 2016), with 12-30% of the species
resulting paraphyletic. Identification success may decrease with
increasing spatial scale of sampling; up to a drop of 50% at
continental scales (Bergsten et al., 2012). Sampling on a continental
scale thus considerably increases the complexity of barcoding studies.
Most of the ”northern” latitude studies, however, are supposed to
contain species with only low infraspecific haplotype diversity (due to
extinctions and recolonization events during and after the Pleistocene;
e.g., Hewitt, 1996, 1999; Schmitt, 2007; Ahrens et al., 2013), and often
assemblages only contain a small number of closely related species.
Thus, these data do not represent suitable test cases of species
delimitation performance when the component of actual geographic genetic
variation is excluded. On the other hand, several studies on tropical
groups or locations also include specimens from one or more other sites
(e.g., Elias et al., 2007; Thormann et al., 2016; Janzen et al., 2009;
Janzen & Hallwachs, 2011) or large amount of mismatch of MOTUs with
morphospecies was seen as evidence for cryptic diversity (e.g., Janzen
et al., 2009; Janzen & Hallwachs, 2011).
Here we present a data set that was sampled from one local assemblage in
a Southeast-Asian biodiversity hotspot (Laos: Phou Pan mountain). We
investigate the performance of various species delimitation approaches
on a megadiverse assemblage of herbivore chafer beetles (Coleoptera:
Scarabaeidae: Sericini). Our objective is to infer whether species
delimitation suffers from exaggerate infraspecific variation in the same
way that led to inconsistencies between entities from DNA-based and
morphology-based species inference in previous studies, despite the lack
of geographic genetic variation. We are interested in the degree of deep
coalescence in this local species assemblage and in how species
delimitation approaches handle these data. Excluding geographic genetic
variation we would expect less problems due to deep coalescences and
thus higher rates of taxonomic congruence between morphospecies and
MOTUs. Furthermore we employ clustering algorithms similar to those used
in metabarcoding approaches, to explore the reliability of this critical
step in current metabarcoding analyses pipelines (e.g., Coissac et al.,
2012; Deiner et al., 2017; Macher et al., 2018; Ruppert et al., 2019).