Introduction

Since DNA barcoding was formally proposed on a large scale , cox1sequences have been rapidly accumulated from all around the world (Porter & Hajibabaei, 2018). Early studies mostly had a narrow systematic focus and targeted local or regional species assemblages. With emerging global comprehensiveness from the global iBOL project (International Barcode of Life), researchers became aware of the problems that arise with the use of cox1 (i.e., mitochondrial DNA) as taxonomic marker (Funk & Omland 2003; Ballard & Whitlock, 2004; Dasmahapatra & Mallet, 2006; Dupuis et al., 2012; Nicholls et al., 2012; Smith et al., 2012; Dowton et al., 2014; Ross, 2014; Eberle et al., 2019), but also the effects of geographic scale on accuracy and performance of barcoding (Lohse, 2009; Bergsten et al., 2012; Gaytán et al., 2020). Therefore, geographic sampling has been a central debate (Lim et al., 2012; Reid & Carstens, 2012; Talavera et al., 2013; Ahrens et al., 2016), in particular with respect to DNA barcoding, one of the major tools of DNA taxonomy.
In order to infer in more detail the nature of the genetic markers used for taxonomy, and also to investigate further the empirical behaviour of species delimitation approaches currently in use, it would be desirable to test commonly used methods on a dataset without geographic bias that still provides a sufficient number of closely related taxa. Here, we focus on cox1 , since it continues and will continue to be a widely used marker for taxonomy in barcoding and metabarcoding studies.
So far, most comprehensive barcoding efforts have been made in ”northern” and widely temperate countries (e.g., Pentinsaari et al., 2014, 2019; Gwiazdowski et al., 2015, Hendrich et al., 2015; Rougerie et al., 2015, Hebert et al., 2016, Rulik et al., 2017; Bouchard et al., 2017, Steinke et al., 2017; see also: ; https://www.bolgermany.de/wp/startseite/news-publikationen/publikationen/page/2/). The number of studies in tropical or subtropical areas is comparatively low or limited to a narrow focal group (e.g., Elias et al., 2007; Janzen et al., 2009; Janzen & Hallwachs, 2011; Astrin et al., 2012; Ahrens et al., 2016; Cancian de Araujo et al., 2019), and only few authors assembled data on the global level (e.g., Zhou et al. 2016).
Interestingly, in regional (i.e., national) level libraries, molecular operational taxonomic units (MOTUs, i.e., BINs; Ratnasingham & Hebert, 2013) showed perfect matches to known morphospecies in nearly 90% of the studied species (e.g., Pentinsaari et al., 2014; Hendrich et al., 2015). Occasionally, mismatch to described species occurred due to splitting into clusters of different geographic origin (e.g., Morinière et al., 2017) or sharing of identical or closely related haplotypes among different morphospecies (e.g., Hawlitschek et al., 2017). However, matches generally decreased when geographic sampling of species was wider, e.g., on a continental scale (Bergsten et al., 2012; Schmid-Egger et al., 2018, Mutanen et al., 2016), with 12-30% of the species resulting paraphyletic. Identification success may decrease with increasing spatial scale of sampling; up to a drop of 50% at continental scales (Bergsten et al., 2012). Sampling on a continental scale thus considerably increases the complexity of barcoding studies. Most of the ”northern” latitude studies, however, are supposed to contain species with only low infraspecific haplotype diversity (due to extinctions and recolonization events during and after the Pleistocene; e.g., Hewitt, 1996, 1999; Schmitt, 2007; Ahrens et al., 2013), and often assemblages only contain a small number of closely related species. Thus, these data do not represent suitable test cases of species delimitation performance when the component of actual geographic genetic variation is excluded. On the other hand, several studies on tropical groups or locations also include specimens from one or more other sites (e.g., Elias et al., 2007; Thormann et al., 2016; Janzen et al., 2009; Janzen & Hallwachs, 2011) or large amount of mismatch of MOTUs with morphospecies was seen as evidence for cryptic diversity (e.g., Janzen et al., 2009; Janzen & Hallwachs, 2011).
Here we present a data set that was sampled from one local assemblage in a Southeast-Asian biodiversity hotspot (Laos: Phou Pan mountain). We investigate the performance of various species delimitation approaches on a megadiverse assemblage of herbivore chafer beetles (Coleoptera: Scarabaeidae: Sericini). Our objective is to infer whether species delimitation suffers from exaggerate infraspecific variation in the same way that led to inconsistencies between entities from DNA-based and morphology-based species inference in previous studies, despite the lack of geographic genetic variation. We are interested in the degree of deep coalescence in this local species assemblage and in how species delimitation approaches handle these data. Excluding geographic genetic variation we would expect less problems due to deep coalescences and thus higher rates of taxonomic congruence between morphospecies and MOTUs. Furthermore we employ clustering algorithms similar to those used in metabarcoding approaches, to explore the reliability of this critical step in current metabarcoding analyses pipelines (e.g., Coissac et al., 2012; Deiner et al., 2017; Macher et al., 2018; Ruppert et al., 2019).