Emy Guilbault

and 2 more

In recent years, the increase of data availability through citizen science campaigns has raised questions on the quality of this data. Species distribution models can be severely impacted by non-random spatial distributions of records. Multiple methods exist to correct for spatial bias and most of them imply that the sampling is uneven in space and determined by the observers’ choices of where to search for observations. One common correction method is to include a covariate in the model as a proxy for sampling bias and correcting for this bias by setting this covariate equal to a common value upon prediction. However, this approach implies that each observer behaves in the same manner, which in practice may not be the case. Here, we differentiate two common observer behaviours: exploring and following. Under this paradigm, explorers seek to observe species in new places far away from other observations and away from common routes of transit. By contrast, followers search near already observed species locations and remain closer to common routes of transit. In this paper, we investiage whether the current approaches to correcting for observer bias hold under varying observer behaviours, or whether a data-driven approach based on modelled observer behaviour may lead to better predictions. To do so, we developed a new software platform, obsimulator, to simulate patterns of points driven by observer behaviour. We established two correction methods based on a bias incorporation approach using k-nearest neighbours and density calculation. Broadly, we found that the method of including a bias covariate and setting it to a common value for prediction yields the best results. We also found that the knn-based correction outperformed the density-based correction. Additionally, we provide guidance for setting model parameters based on the ratio of explorers versus followers in the observers’ cohort.

Tomas Roslin

and 96 more

To associate specimens identified by molecular characters to other biological knowledge, we need reference sequences annotated by Linnaean taxonomy. In this paper, we 1) report the creation of a comprehensive reference library of DNA barcodes for the arthropods of an entire country (Finland), 2) publish this library, and 3) deliver a new identification tool based on this resource. The reference library contains mtDNA COI barcodes for 11,275 (43%) of 26,437 arthropod species known from Finland, including 10,811 (45%) of 23,956 insect species. To quantify the improvement in identification accuracy enabled by the current reference library, we ran 1,000 Finnish insect and spider species through the Barcode of Life Data system (BOLD) identification engine. Of these, 91% were correctly assigned to a unique species when compared to the new reference library alone, 85% were correctly identified when compared to BOLD with the new material included, and 75% with the new material excluded. To capitalize on this resource, we used the new reference material to train a probabilistic taxonomic assignment tool, FinPROTAX, scoring high success. For the full-length barcode region, the accuracy of taxonomic assignments at the level of classes, orders, families, subfamilies, tribes, genera, and species reached 99.9%, 99.9%, 99.8%, 99.7%, 99.4%, 96.8%, and 88.5%, respectively. The FinBOL arthropod reference library and FinPROTAX are available through the Finnish Biodiversity Information Facility (www.laji.fi). Overall, the FinBOL investment represents a massive capacity-transfer from the taxonomic community of Finland to all sectors of society.