Authorea

blasbenito edited materials_and_methods.tex over 9 years ago

Commit id: c0d9fda7654cb3c3c496c8747b8b10ae2b1919f7

deletions | additions

To compensate for potential taphonomical or geographical bias, and following the ecological niche theory (CITATION) we assumed that the species responses to the environmental factors were gaussian. GLMs link model structure to hypothesis by allowing the users to define the shape of the response curves and the important interactions between variables. To include this assumption into the modelling process, we configured the GLMs to consider second degree polynomials using formulas with the form \textit{response ~ poly(variable1, 2) + poly(variable2, 2) + ...}. We do not considered interactions. One drawback of this approach arises when the number of presences is low. As a general rule, at least five presence points per predictor are required in a GLM fit to avoid overparameterization (CITATION), but this number raises to ten when using two degree polynomials to fit the model. In our case, with six predictors and up to (NUMBER OF POINTS) 24 points, to fit a single model would have lead to an overly overparameterized model. To overcome this problem, we used the \textit{dredge} function of the R package \textit{MuMIn} (CITATION) to generate all the GLM equations combining the six predictors in groups of one, two and three, resulting in 41 different equations (EQUATIONS IN APPENDIX!). We calibrated one model APPENDIX!) tha were used to calibrate the models for eachcombination of equation and background radius to obtain (5), producing a total of 205 different models. \textbf{Model selection and ensemble model forecasting} We faced three different issues to evaluate our models. First, the lack of absences made it impossible to evaluate the commissión error. Second, the low amount of presences prevented the use of data splitting to evaluate omission errors. Third, quasibinomial GLMs in R do not provide AIC values, making difficult to rank the candidate models according to both model fit and complexity. To deal with these issues while providing a robust model evaluation framework, we used a leave-one-out approach to compute AUC values based on 1000 190 pseudoabsences (separated 200 km from each other, and not overlapping the presence records) not used to calibrate the models as an extrinsic measure to evaluate omission errors (CITE PHILLIPS), and adjusted explained deviance as an intrinsic evaluation measure to assess model goodness of fit and complexity (taken as the number of predictors). The leave-one-out approach was computed as follows for each model, once per presence record available: \begin{enumerate}