Panu Somervuo - Authorea

In recent years, the increase of data availability through citizen science campaigns has raised questions on the quality of this data. Species distribution models can be severely impacted by non-random spatial distributions of records. Multiple methods exist to correct for spatial bias and most of them imply that the sampling is uneven in space and determined by the observers’ choices of where to search for observations. One common correction method is to include a covariate in the model as a proxy for sampling bias and correcting for this bias by setting this covariate equal to a common value upon prediction. However, this approach implies that each observer behaves in the same manner, which in practice may not be the case. Here, we differentiate two common observer behaviours: exploring and following. Under this paradigm, explorers seek to observe species in new places far away from other observations and away from common routes of transit. By contrast, followers search near already observed species locations and remain closer to common routes of transit. In this paper, we investiage whether the current approaches to correcting for observer bias hold under varying observer behaviours, or whether a data-driven approach based on modelled observer behaviour may lead to better predictions. To do so, we developed a new software platform, obsimulator, to simulate patterns of points driven by observer behaviour. We established two correction methods based on a bias incorporation approach using k-nearest neighbours and density calculation. Broadly, we found that the method of including a bias covariate and setting it to a common value for prediction yields the best results. We also found that the knn-based correction outperformed the density-based correction. Additionally, we provide guidance for setting model parameters based on the ratio of explorers versus followers in the observers’ cohort.