Study group, sampling and identification

The study group is the megadiverse tribe Sericini that contains worldwide nearly 4000 described species in about 200 genera . They are one of the oldest lineages of phytophagous Scarabaeidae and diversified with the rise of the angiosperms 108 Ma . Sericini are nearly worldwide distributed, except in Australia, most oceanic islands, archipelagos, and circumpolar regions . The polyphagous herbivore adults are fully winged while larvae feed on roots and underground stems of living plants . Some species are considered as crop pests . Their highly similar external morphology makes the species difficult to distinguish, but highly complex male genitalia are well-differentiated between species and show only little intraspecific variation .
Sampling was conducted during four weeks in April, 2014 by Carolus Holzschuh and local collectors in the Phou Pan mountain area (Laos, Hua Phan province) (Fig. 1) (ca. 20°12’N, 104°01’E), at an elevation between 1300 to 2000 meters. Specimens were collected using light traps, by hand, or netting during daytime. The Phou Pan mountain is situated in the Indo-Burmese biodiversity hotspot area (Myers et al., 2000) which is characterized by extremely high endemism. The habitat with its dense rainforests offers a large variety of plant species for herbivore insects to feed on. For this study we used only males (1086 specimens), since they were assignable to distinct morphospecies, while females are often not distinguishable among closely related syntopically occurring species. Samples were pinned after DNA extraction, dry mounted, labelled, and preserved at the ZFMK (Zoologisches Forschungsmuseum Alexander König, Bonn, Germany) (see Supplement Table 1).
Specimens were sorted to morphospecies by the complex shape of their copulation organ, i.e., aedeagus, which has been proven to be the best suited trait system to robustly infer the species entities for this group . For this purpose, male genitalia of all specimens were dissected. Habitus and genitalia of each species were photographed with a stereomicroscope (ZEISS Stereo Discovery.V20) connected to a ZEISS Axiocam. Presumably undescribed species that were not yet referable to an available species name, were numbered consecutively (sp1, sp2, etc.).

DNA sequencing

We sequenced the cox1 gene (5′-end) of multiple specimens (3-5) per morphospecies (in total 190). Lab work followed the standard protocols of the German Barcode of Life project . DNA was extracted from mesothoracic leg and attached muscles using the Qiagen DNeasy Blood and Tissue Kit (Hilden, Germany) or the Qiagen BioSprint96 magnetic bead extractor (Hilden, Germany).
The PCR reaction was carried out in total reaction mixes of 20 μl, including 2 μl of undiluted DNA template, 0.8 μl of each primer (10 pmol/μl), and standard amounts of the reagents provided with the “Multiplex PCR” kit from Qiagen (Hilden, Germany) using primers LCO1490-JJ [5’-CHACWAAYCATAAAGATATYGG-3’] and HCO2198-JJ [5’-AWACTTCVGGRTGVCC AAARAATCA-3’] . Thermal cycling was performed on Applied Biosystems 2720 thermal cyclers (Life Technologies, Carlsbad, CA, USA), using a PCR program with two cycle sets, combining a “touchdown” and a “step-up” routine as follows: hot start Taq activation: 15 min at 95 °C; first cycle set (15 repeats): 35 s denaturation at 94°C, 90 s annealing at 55°C (−1°C per cycle) and 90 s elongation at 72°C; second cycle set (25 repeats): 35 s denaturation at 94°C, 90 s annealing at 40°C, and 90 s elongation at 72°C; final elongation 10 min at 72 °C. Unpurified PCR products were subsequently sent for bidirectional Sanger sequencing to BGI Tech Solutions (Hongkong, China).
Raw DNA sequences were assembled (forward and reverse sequence) and edited in Geneious R7 (version 7.1.3, Biomatters Ltd.) to correct base-calling errors and to assign ambiguities (when forward and reverse sequence were not congruent for certain nucleotides). Sequences were aligned with Muscle (Edgar, 2004) as implemented into Geneious using the default settings. Primers were trimmed subsequently. All data are deposited in BOLD (project: SCOIB) and GenBank (accession numbers MW128167-MW128351) respectively (see Supplement Table 1).

Phylogenetic analysis and species delimitation

Putative morphospecies were compared with results obtained from the DNA-based species delimitation methods. We applied Poisson tree process (PTP) , statistical parsimony network analysis (TCS) , Automatic Barcode Gap Discovery (ABGD) , distance based clustering, and Barcode of Life database (BOLD) - Barcode Index Numbers (BINs). These methods were applied on all sequenced beetles to result in clusters that are considered molecular taxonomic units (MOTUs) , i.e., DNA-based species-assignments by the respective method.
A phylogenetic tree was calculated with maximum likelihood from the final multiple alignment of all DNA sequences using the IQ-TREE web server (IQ-TREE version 1.6.12; http://iqtree.cibiv.univie.ac.at/) ; the best substitution model (GTR+F+I+G4) was chosen with ModelFinder (Kalyaanamoorthy et al., 2017) according to Bayesian Information criterion (BIC). Branch support was calculated by generating 1000 samples for ultrafast bootstrapping (Hoang et al., 2018). The resulting tree was midpoint rooted in FigTree v1.4.3 (available from http://tree.bio.ed.ac.uk/software/figtree/). This tree was the basis for the PTP analysis. Additionally, split networks were generated using SplitsTree4 v. 4.16.1 to visualize incompatible and ambiguous signals in the cox1 dataset. In these networks the parallel edges, rather than the single branches, illustrate splits concluded from the data.
We used both versions of the Poisson tree process model (PTP) on the PTP web server (https://species.h-its.org/; accessed on August 5th 2020): bPTP, which adds Bayesian support (pp) values to branches that delimit species in the input tree, and the refined multi-rate mPTP. PTP uses the shift in the number of substitutions at internal nodes to identify branching rate transition points which indicate speciation events. We used default settings for the bPTP analysis (100000 MCMC generations, thinning: 100, burn-in: 0.1, seed 123).
Statistical network analysis as performed with TCS v. 1.21 separates the sequence data into clusters of closely related haplotypes connected by changes that are non-homoplastic with a 95% probability (Templeton et al., 1992); if applied to mtDNA the extent of the networks has been found to be largely congruent with morphospecies
Automatic Barcode Gap Discovery (ABGD) was conducted using the ABGD webserver (https://bioinfo.mnhn.fr/abi/public/abgd/abgdweb.html; accessed on August 17th 2020) with default parameters (i.e., using Jukes-Cantor model (JC69) distances, a relative gap width of 1 and 50 steps, Pmin=0.001, Pmax=0.1, Nb bins for distance distribution= 20). ABGD partitions individuals for a range of prior intraspecific distances, instead of using one predefined distance threshold . A robust result across a range of prior intraspecific distances was chosen as the best partition scheme. This outcome was also closest to the number of morphospecies and simultaneously matched the presumptive barcode gap in the histogram of distances.
Distance based clustering was done with the tclust-function in the R-package spider (v. 1.5.0; Brown et al., 2012). A threshold of 3% was applied to the pairwise distance matrix of all specimens that was corrected with the Kimura model (K80). The logic of this approach underlies most metabarcoding protocols (Macher et al., 2018; Piper et al., 2019; Liu et al., 2020), relying on the presence of a barcoding gap (Elbrecht et al., 2017), which was chosen as a gap at 3 % pairwise distance by the majority of studies (however, see Beentjes et al., 2019 for an 2 % example). Finally, we compared outcome from species delimitations to Barcode Index Number (BIN) assignments (Ratnasingham & Hebert, 2013) in the BOLD data base (Project - SCOIB: Sericini COI Barcoding).
To check the performance and accuracy of the DNA-based delimitation methods compared to the a priori morphospecies hypotheses based on the genital morphology, the match ratio was calculated: Match ratio = 2*Nmatch/(NMOTU+Nmorph).Nmatch is the number of species with exact matches, when the morphospecies and DNA-based species delimitation results to include the same specimens. NMOTU is the number of classified groups by the different delimitation methods and finally Nmorph is the number of morphospecies. All resulting MOTUs were mapped onto the phylogenetic tree beside terminal’s labels (Fig. 2).