blasbenito edited materials_and_methods.tex  about 9 years ago

Commit id: 6593990ecf56b6cebb53eb5e6bbbcc14cbe22388

deletions | additions      

       

\textbf{Study area}  The geographical framework of this study comprises Europe and the Irano-Turanian region (20 (20ºN  to 70 degrees North, 70ºN,  and -10 10ºW  to 70 degrees East). 70ºE).  We excluded the North of Africa from our analyses, since there are not known Neanderthal remains (CITATION?). remains.  \textbf{Presence data}  We gathered extracted from the literature  up to 65 localities 51 locations  attributed to Neanderthals presence during the last interglacial from period. 20 presences belong to MIS 5e, 4 to MIS 5e-d, 7 to MIS 5d, while  the literature (see Appendix). From this dataset we selected localities dated with higher certainty dates of the remaining ones were broadly attributed  to MIS 5e (Eemian period), resulting in 33 sites. 5 by the authors.  A few of these sites (particularly in northern Germany and Spain) were situated close We scored the presence records from 1  to surrounding sites, forming clusters 3 according to their reliability: 1, reliable data; 2, uncertain Neanderhtal affiliation or chronology; 3 lack  of sites sharing similar ecological conditions. chronology.  To reduce calibrate  the risk of species distribution model we selected... (TRINE WILL REVIEW THE CRITERIA AGAIN BEFORE TO WRITE THIS PART)  To reduce  pseudorreplication in the presence dataset,  we filtered the data to ensure a minimum distance of 50 100  km between nearby locations (CITATION REQUIRED). Thisprocess  reduced the sample to 24 sites (see Fig. 1). (LOW NUMBER HERE!).  \textbf{Environmental variables}  Climate influences species distribution at global to regional scales, while the effect of topography is restricted to scales ranging from regional to local \cite{Pearson_2003}.  To represent variables influencing Neanderthals distribution at the continental scale we have selected a palaeoclimatic simulation performed with the NCAR Community Climate System Model \cite{Otto2006}. This coupled ocean-atmosphere-land-sea-ice general circulation model is was  also coupled with an ice-sheet simulation to simulate the start of the Eemian (~130 ka BP). The results of this palaeoclimatic model fit very well with the available climatic proxies for the given period \cite{Otto2006}. This simulation is available at www.worldclim.com as a set of maps downscaled to 1km resolution representing 19 bioclimatic variables. To obtain GIS layers representing topography (factor represent factors  influencing species distributions Neanderthals distribution  at the  local scales) scale,  we processed the SRTM digital elevation model provided by www.worldclim.com (SRTM  model aggregated to 1km resolution, \cite{Farr_2007} provided by www.worldclim.com. From this digital elevation model we derived \cite{Farr_2007}) to derive the  topographical slope, topographic wetness index, and a measure of topographic diversity (see appendix for further details). All the variables were aggregated to 5 km resolution using GRASS GIS (GRASS development team 2012). To represent more accurately Eemian conditions, we defined the sea level seven meters above the current The  sea level was set at 7 m.a.s.l  (CITATION REQUIRED). REQUIRED)  We assessed colinearity among the 22 environmental factors (19 bioclim variables and 3 topographic variables) using a two steps step  approach. Firstly, we computed the correlation matrix among variables, and defined 0.5 pearson correlation index as threshold to retain or reject variables. Secondly we applied a variance inflation factor analysis (R package HH, CITATION). CITATION ).  Our final set of predictors was composed by six environmental variables describing three general drivers of species distributions: temperature, water availability, availability  and topographical setting. Temperature was represented by it extremes: maximum temperature of warmest month (bio5) and minimum temperature of coldest month (bio6). Water availability was represented by annual precipitation (bio12) and precipitation of warmest quarter (bio18). The descriptors of the topographical setting were slope and topographic diversity. We hipothesize that the northern limit of Neanderthals range was limited at the subcontinental scale by cold winter temperatures, while a combination of high temperatures and low water availability during the summer in the mainland restricted the southern distribution to coastal areas. At the local scale, the distribution could have been influenced by moderate slopes combined with high topographic diversity, that favored at the same time mobility and a high availability of hunting resources.  \textbf{Species distribution modeling}  After considering very carefully the nature of our data (lack of absences, low number of presences, unknown bias, and overdispersion), we have chosen generalized linear models (GLMs) calibrated with the quasibinomial family and weighted background as species distribution modeling method.GLMs are simple methods, and therefore provide a higher tractability, allow a better understanding of the drivers of species distributions and are easily generalized to new datasets \cite{Merow1267}  The application of the quasibinomial family to calibrate the GLM allows to deal with overdispersed data. The use of a weighted background compensates for the lack of absences by providing a comprehensive sampling on the available ecological conditions across the study area. But the geographical distribution of the background points affect modeling outcomes (CITATION VAN DER WAAL), and there are several studies (CITATIONS) recommending to restrict the background to areas accesible to the species by dispersal. In our case, there was no data to develop such criteria, and therefore we explored the sensitivity of the model to different sets of background points generated inside buffers around the presence records at increasing radius (100, 200, 400, 800 and 1600 km).We sampled 20 percent of the cells within each buffer, resulting in 6282 cells at 100 km, 18105 at 200 km, 40815 at 400 km, 67225 at 800 km, 124044 at 1600 km.  To compensate for potential taphonomical or geographical bias, and following the ecological niche theory (CITATION) we assumed that the species responses to the environmental factors were gaussian.GLMs link model structure to hypothesis by allowing the users to define the shape of the response curves and the important interactions between variables.  To include this assumption into the modelling process, we configured the GLMs to consider second degree polynomials using formulas with the form \textit{response ~ poly(variable1, 2) + poly(variable2, 2) + ...}.We do not considered interactions.  One drawback of this approach arises when the number of presences is low. As a general rule, around at least  five presence points per predictor are required in a GLM fit to avoid overparameterization (CITATION), but this number raises toaround  ten when using two degree polynomials to fit the model. In our case,to fit a single GLM  with six predictors and 24 presences up to (NUMBER OF POINTS) points, to fit a single model  would have lead to an overly  overparameterized model. To overcome this problem, we used the \textit{dredge} function of the R package \textit{MuMIn} (CITATION) to generate all the GLM equations combining the six predictors in groups of one, two and three, resulting in 41 different equations (EQUATIONS IN APPENDIX!) tha were used to calibrate the models APPENDIX!). We calibrated one model  for each combination of equation and  background radius (5), producing a total of to obtain  205 different models. \textbf{Model selection and ensemble model forecasting}  We faced three different issues to evaluate our models. First, the lack of absences made it impossible to evaluate the commissión error. Second, the low amount of presences prevented the use of data splitting to evaluate omission errors. Third, quasibinomial GLMs in R do not provide AIC values, making difficult to rank the candidate models according to both model fit and complexity. To deal with these issues while providing a robust model evaluation framework, we used a leave-one-out approachimplemented in the R function LeaveOneOut (see Appendix)  to computetwo different measures:  \begin{itemize}  \item  AUC(ROC curve, see Fielding and Bell...)  valuesas an extrinsic measure to evaluate relative omission errors (CITE PHILLIPS). The AUC values were  based on 168 1000  pseudoabsences not used to calibrate the models. Pseudoabsences were separated 200 km from each other to reduce pseudorreplication, and separated from the presence records the same distance to avoid an artificial reduction of the AUC values (see appendix for further details). AUC values were computed by the LeaveOneOut function models  asthe proportion of pseudoabsences with  an habitat suitability value lower than the habitat suitability of the test presence.  \item Adjusted extrinsic measure to evaluate omission errors (CITE PHILLIPS), and adjusted  explained deviance as an intrinsic evaluation measure to assess model goodness of fit and complexity. This measure was implemented in complexity (taken as  the LeaveOneOut function with this expression: 1-((cases - 1)/(cases - predictors))*(1 - ((null deviance - deviance) / null deviance)).  \end{itemize} number of predictors).  The LeaveOneOut function worked leave-one-out approach was computed  as follows to calibrate and evaluate 4920 models: for each model, once per presence record available:  \begin{enumerate}  \itemOne model (out of 41) is selected.  \item A background radius (out of 5) is selected.  \item  A test presence record(out of 24)  is selected. \item All presences and background data within 2.5 degrees radius around the test presence are removed.  \item A GLM model (quasibinomial family) is calibrated using all presence records except the input data not removed in the following step, without  the test one. presence.  \item Adjusted explained deviation is computed from for  the GLM results. is computed as:  1-((cases - 1)/(cases - predictors))*(1 - ((null deviance - deviance) / null deviance)).  \item AUC An approximation to the Area Under the Curve (CITATION Fielding and Bell)  is computed using as the proportion of pseudoabsences with an habitat suitability value lower than  the habitat suitabilityvalues  of the test presence and the pseudoabsences as inputs. presence.  \end{enumerate}  We averaged Finally,  all the adjusted explained deviance and AUC values were averaged  for each combination of model and background radius, and model. We  selected all models with AUC values higher than 0.65 and adjusted explained deviance values above 0.15. 0.1.  Wecalibrated such models with all the presence records, and  computed the average and the standard deviation of the group of best models (CITATION ENSEMBLE). The average, scaled from 0 to 1, represented habitat suitability values, while the standard deviation represented the level of agreement reached by the best models. We plotted both measures into a single map, using the \textit{whitening} method proposed by \cite{Hengl200475} to enhance visualization and model interpretation. In the final model, areas with high habitat suitability values and low standard deviation (good agreement among models) were considered robust indicators of suitable habitat for Neanderthals, while low habitat suitability values plus low standard deviation were considered good  indicators of non-suitable habitat. To discuss habitat suitability patterns versus uncertainty at Neanderthal sites, we plotted the values of habitat suitability versus the standard deviation of habitat suitability for all the presence records, grouping by country of origin to enhance interpretation. a harsh habitat, and probably, absence.  \textbf{Importance of environmental factors}  Our main goal was is  to extact as much ecological information as possible from both the data and the model, model  to reach a better understanding of Neanderthals ecology. To do so we analyzed the influence of the environmental factors over the habitat suitability from local to a  continental scales. and a local scale.  To We applied Random Forest \cite{Breiman20015} to  analyze the influence of the  environmental factors at the local scale, we firstly defined \textit{local scale} as the average home range of Neanderthals. According to \cite{Daujeard201232}, and based on the transportation of raw lithic materials, the regional mobility range of over  Neanderthalsduring the Middle Palaeolithic was around 50 kilometers. Other measures of mobility given by Roebreks et al. 1998 and (Feblot-Augustins 1993) are around 100 and 300 km, but we considered them to be too large to be considered local. We coded an R function named MapLocalImportance (see Appendix) to aggregate the  habitat suitability model and at  the predictors into 50 km cells, and to fit one linear model per predictor and 50 km cell by using habitat suitability as response variable. continental scale.  The function assigned the adjusted R squared to each of the 50 km cells as a measure ability  of local importance for the given predictor, the coefficient Random Forest  to measure deal with non-linearity makes it perfect to analyse our ensemble, since non-linearity may arise when averaging  the direction results  of the relationship, and the p-value multiple GLMs. Also, Random Forest is regarded as a robust method  to assess the statistical significance of the coefficient. variable importance \cite{Cutler20072783}.  We extracted the used both measures of variable  importance values at every presence record from available in  the results of this randomForest R  function (Liaw  and ranked the predictors according their importance (R squared of local regression) using a boxplot. We also plotted the presence records (grouped by country to enhance interpretation) against habitat suitability, the regression coefficients Wiener 2012): mean decrease in accuracy  and values of the most important factor at the local scale. This plot allowed us to explain differences total decrease  in habitat suitability among presence records belonging to different regions. node impurities (node impurity: heterogeneity of target categories within a node).  We applied Random Forest \cite{Breiman20015} to To  analyze the influence ofthe  environmental factorsover Neanderthals habitat suitability  at the continental local  scale, and to assess we firstly defined \textit{local scale} as  the drivers average home range  of uncertainty (standard deviation) within Neanderthals. According to \cite{Daujeard201232}, and based on  the ensemble. The ability transportation  of Random Forest to deal with non-linearity makes it perfect to analyse our ensemble, since non-linearity may arise when averaging raw lithic materials,  the results regional mobility range  of multiple GLMs. Also, Random Forest is regarded as a robust method to assess variable importance \cite{Cutler20072783}. We used both Neanderthals during the Middle Palaeolithic was around 50 kilometers. Other  measures of variable importance available in the randomForest R function (Liaw mobility given by Roebreks et al. 1998  and Wiener 2012): mean decrease in accuracy (Feblot-Augustins 1993) are around 100  and total decrease in node impurities (node impurity: heterogeneity of target categories within a node). 300 km, but we considered them too extense (REWORD!)  The materials and R scripts used to perform our analysis are available HERE (github repository, or appendix, I don't know yet).  The variability observed in the types of occupation sites in this region of the Rhone Valley, their seasonal recurrence and their stability (whatever their duration) within the long sequences of the Middle Palaeolithic and the regional mobility range (50e60 km according to the raw materials), allow association of the mode of mobility with a logistical system of territorial organisation e in the sense of functional and planning aspects