2.1 Data selection
Species list were initially obtained from floristic data networks from Cerrado areas (http://cerrado.rbge.org.uk; Ratter et al., 2011; Castro et al., 2010; Vieira et al., 2019). In addition, twenty-one areas were sampled in the northern region of Cerrado (in the states of Piauí, Maranhão, Pará and Amapá), summary data available at https://doi.org/10.5061/dryad.9cnp5hqd4). To answer our question about possible differences in the effect of climate change on different species groups, from the 235 species recorded in our previous list, we selected five species of wide national distribution in the Biome (SpG); and five species with marginal distribution of the northern Cerrado, typical of the peripheral areas investigated and absent or rare in the central region of Brazil (SpM): Bowdichia virgilioides (754 occurrences), Byrsonima crassifolia (523), Curatella americana (956), Himatanthus articulatus (441), Parkia platycephala (259), Plathymenia reticulata (604), Qualea grandiflora (858), Q. parviflora 727), Salvertia convallariodora (561) and Vatairea macrocarpa (413) (Tab. S1 in Appendix S1). Our choice was based on the largest data set, highest constancy values in the national savanna areas and highest importance value [IV] indices in the inventoried northern savannas. Finally, in addition to inventory and field survey data, occurrence records of these species were obtained from the Global Biodiversity Information Facility open access platform (http://gbif.org) using the rgbif package (Chamberlain et al., 2019) in R (R Development Core Team, 2019; S1 File in Appendix S3).
Thus, the total known distribution area of each species in South America was used for the model construction process (see below) and then projected to the focal area. We opted for the smallest spatial aggregation, restricting the presence records to a distance of 1 km. Non-georeferenced fields, those with duplicate records within the minimum distance (1 km²), and those outside the expected distribution for the species were excluded. Although typical of Cerrado, they are plants randomly recorded in ecotonal areas under the influence of neighboring domains, and for this reason the filters were based on the checklist of the present study, in the revised floristic lists of Castro et al. (2010), Ratter et al. (2011), Vieira et al. (2019) and the Species List of the Brazilian Flora (BFG, 2018; http://floradobrasil.jbrj.gov.br). The final database included 6,106 presence records (ranging from 256 to 956; Tab. S1 in Appendix S1) for the ten native species of Brazil, which were significantly represented in Cerrado sites (Fig. S1 in Appendix S2).
WorldClim data on current climate and projections for 2050 (http://worldclim.org; Fick & Hijmans, 2017), at a resolution of 30 arcseg (about 1 km²) within the spatial boundaries of South America (12°N, 56°S, 91°O, 34°O) were used for the purpose of creating ecological niche models (ENMs). This extension was used to capture the spectrum of climate variation for the entire known distribution of species. Among the 20 variables (altitude and 19 bioclimatic variables), the least collinear variables were selected using theremoveCollinearity tool (multicollinearity.cutoff = 0.75, nb.points = 10000) from the virtualspecies package (Leroy, Meynard, Bellard, & Courchamp, 2015): altitude, annual mean temperature, max temperature of warmest month, isothermality, mean diurnal range, precipitation of coldest quarter, annual precipitation, precipitation of warmest quarter, precipitation seasonality, and precipitation of driest quarter (Fig. S2 in Appendix S2). Future climate projections were derived from two global atmosphere-ocean circulation model (AOGCM) - the Community Climate System Model (CCSM4) and the Hadley Center Global Environmental Model (HadGEM2-CC) - in two greenhouse gas concentration situations (RCPs 4.5 and 8.5) foreseen by the Fifth Assessment (CMIP5) of the Intergovernmental Panel on Climate Change (IPCC, 2014), representing an optimistic achievable scenario and a “business-as-usual” scenario.
2.2 Modeling runs and analyses
To estimate the potential distribution of species, ENMs were generated in the biomod2 package (Thuiller, Georges, Engler, & Breiner, 2019) by running four different algorithms: Generalized Linear Model (GLM), Artificial Neural Network (ANN), Random Forest (RF), and Maximum Entropy (MAXENT). Species datasets (presence + pseudo-absence records) were randomly divided into 75% for calibration (training points) and 25% for evaluation (test points) and this procedure was repeated 10 times for each set for each species. To assess the predictive power of the models, the True Skill Statistics (TSS; threshold-dependent) values (Allouche et al., 2006) and the area under the receiver operating characteristic (ROC) curve (Phillips, Anderson, & Schapire, 2006) were measured. Sensitivity (proportion of correctly predicted presences/true positives) and specificity (proportion of correctly predicted absences/true negatives) levels were also reported. The consensus maps were generated by the committee averaging method to represent the concordances between the different runs. The lower performing models (TSS < 0.5) were eliminated from the consensus building process (ensemble). Thus, five consensus maps were generated per species for each scenario (one current, two RCP 4.5 and two RCP 8.5) overlapping the projected presences of species from present day to 2050 to estimate where the environmental characteristics will resemble those of current sites of occurrence, and where these species will lose, gain or maintain their distribution. Continuous species suitability values were transformed into binary data (presence/absence) using a cut-off threshold that maximized TSS (Cutoff, Tab. S1 in Appendix S1) and the projected area of occurrence was calculated by multiplying the cell area by the cell count. The modeling results were visualized through the QGIS Geographic Information System (QGIS, 2019) and analyzed using the R Statistical environment and the raster library (File S1 in Appendix S3).
We also measured the potential dynamics of change in the range extension (area gain/loss) of the species based on the differences between the projected area for each current and future climate scenario. This analysis was made for each species separately, to recognize ecological functional differences between plant populations. The size of the areas with greater climate stability (referred to as refuges in the present study) was calculated from the number of pixels considered as presence when overlapping the models of all scenarios – individually, by status (generalist [SpG] and marginal [SpM]), and altogether. Based on these refuge maps, the future effectiveness of present CUs from a climate change perspective was measured, accounting for the size of refuge areas for SpG and SpM and both which overlap with the CUs. The data of the Brazilian CUs were obtained from the website of the Ministry of Environment (http://mapas.mma.gov.br).