this is for holding javascript data
blasbenito edited materials_and_methods.tex
over 9 years ago
Commit id: c0d9fda7654cb3c3c496c8747b8b10ae2b1919f7
deletions | additions
diff --git a/materials_and_methods.tex b/materials_and_methods.tex
index 106b9d2..77fda66 100644
--- a/materials_and_methods.tex
+++ b/materials_and_methods.tex
...
To compensate for potential taphonomical or geographical bias, and following the ecological niche theory (CITATION) we assumed that the species responses to the environmental factors were gaussian. GLMs link model structure to hypothesis by allowing the users to define the shape of the response curves and the important interactions between variables. To include this assumption into the modelling process, we configured the GLMs to consider second degree polynomials using formulas with the form \textit{response ~ poly(variable1, 2) + poly(variable2, 2) + ...}. We do not considered interactions.
One drawback of this approach arises when the number of presences is low. As a general rule, at least five presence points per predictor are required in a GLM fit to avoid overparameterization (CITATION), but this number raises to ten when using two degree polynomials to fit the model. In our case, with six predictors and up to
(NUMBER OF POINTS) 24 points, to fit a single model would have lead to an overly overparameterized model. To overcome this problem, we used the \textit{dredge} function of the R package \textit{MuMIn} (CITATION) to generate all the GLM equations combining the six predictors in groups of one, two and three, resulting in 41 different equations (EQUATIONS IN
APPENDIX!). We calibrated one model APPENDIX!) tha were used to calibrate the models for each
combination of equation and background radius
to obtain (5), producing a total of 205 different models.
\textbf{Model selection and ensemble model forecasting}
We faced three different issues to evaluate our models. First, the lack of absences made it impossible to evaluate the commissiĆ³n error. Second, the low amount of presences prevented the use of data splitting to evaluate omission errors. Third, quasibinomial GLMs in R do not provide AIC values, making difficult to rank the candidate models according to both model fit and complexity. To deal with these issues while providing a robust model evaluation framework, we used a leave-one-out approach to compute AUC values based on
1000 190 pseudoabsences
(separated 200 km from each other, and not overlapping the presence records) not used to calibrate the models as an extrinsic measure to evaluate omission errors (CITE PHILLIPS), and adjusted explained deviance as an intrinsic evaluation measure to assess model goodness of fit and complexity (taken as the number of predictors).
The leave-one-out approach was computed as follows for each model, once per presence record available:
\begin{enumerate}