blasbenito edited materials_and_methods.tex  over 9 years ago

Commit id: 8f2590def989b52bf12de7c17aed27460faab0c7

deletions | additions      

       

To compensate for potential taphonomical or geographical bias, and following the ecological niche theory (CITATION) we assumed that the species responses to the environmental factors were gaussian. GLMs link model structure to hypothesis by allowing the users to define the shape of the response curves and the important interactions between variables. To include this assumption into the modelling process, we configured the GLMs to consider second degree polynomials using formulas with the form \textit{response ~ poly(variable1, 2) + poly(variable2, 2) + ...}. We do not considered interactions.  One drawback of this approach arises when the number of presences is low. As a general rule, at least around  five presence points per predictor are required in a GLM fit to avoid overparameterization (CITATION), but this number raises to around ten when using two degree polynomials to fit the model. In our case, to fit a single GLM with six predictors and 24 presences would have lead to an overparameterized model. To overcome this problem, we used the \textit{dredge} function of the R package \textit{MuMIn} (CITATION) to generate all the GLM equations combining the six predictors in groups of one, two and three, resulting in 41 different equations (EQUATIONS IN APPENDIX!) tha were used to calibrate the models for each background radius (5), producing a total of 205 different models. \textbf{Model selection and ensemble model forecasting} 

We applied Random Forest \cite{Breiman20015} to analyze the influence of the environmental factors over Neanderthals habitat suitability at the continental scale, and to assess the drivers of uncertainty (standard deviation) withinn the ensemble. The ability of Random Forest to deal with non-linearity makes it perfect to analyse our ensemble, since non-linearity may arise when averaging the results of multiple GLMs. Also, Random Forest is regarded as a robust method to assess variable importance \cite{Cutler20072783}. We used both measures of variable importance available in the randomForest R function (Liaw and Wiener 2012): mean decrease in accuracy and total decrease in node impurities (node impurity: heterogeneity of target categories within a node).  To analyze the influence of environmental factors at the local scale, we firstly defined \textit{local scale} as the average home range of Neanderthals. According to \cite{Daujeard201232}, and based on the transportation of raw lithic materials, the regional mobility range of Neanderthals during the Middle Palaeolithic was around 50 kilometers. Other measures of mobility given by Roebreks et al. 1998 and (Feblot-Augustins 1993) are around 100 and 300 km, but we considered them to bee too large to be considered local. We divided the habitat suitability model and the predictors into 50 km cells, and fitted a one  linear model (lm function of the R software) separately for each per  predictor (using habitat suitability as response variable)  at each cell. We assigned to each of the 50 km cells the adjusted R squared, as a measure of local importance, importance of the predictor,  the coefficient to measure the direction of the relationship, and the p-value to assess the statistical significance of the predictor's local importance. We mapped both the adjusted R squared and the coefficient values by hiding cells with non-significant relationship (p-value < 0.05) and less than 30 5 km cells. We also mapped variable importance and habitat suitability together by using the whithening method explained above, using color to code habitat suitability and whithening to code the local importance of the variables. Finally, we composed a categorical map showing the variable with the higher importance at the local scale at each cell to enhance the visual analysis. The materials and R scripts used to perform our analysis are available HERE.