Subset 01

The subset of \(n = 340\) soil observations was produced by \citet{SamuelRosaEtAl2011a}, \citet{MiguelEtAl2012}, \citet{Moura-BuenoEtAl2012}, and \citet{Samuel-RosaEtAl2013}. The researchers faced several difficulties with the small budget and shortage of workforce. They also had restricted access to several areas due to geographic barriers and prohibition of access by some landowners. These difficulties forced the researchers to reduce the originally aimed number of sample points (\(n = 500\)) during the development of the project.

All observation locations were selected purposively or by convenience. Tacit knowledge was the main tool to choose the observation locations. This process was carried out in the office using Google Earth imagery of the years of 2008 and 2009. The main goal of the researchers was to obtain a sample that they understood as being representative of the different landforms, land uses, and soil taxa present in the study area. They also wanted the sample points to be spread throughout the entire study area.

At the observation locations, the researchers defined an area of within which they opened three soil pits for sampling. Soil samples were collected to a depth of , or less when soil depth was smaller than . The depth was measured with a ruler. The resulting sampling depth of the entire subset varies from to , with a mean of . This variation of the vertical sampling support was not a problem for the researchers because their goal was to sample the topsoil. The topsoil was defined as the topmost soil layer, with a depth equal or inferior to , being the soil layer most susceptible to degradation induced by poor agricultural practices and land use changes.

Soil samples from the three pits opened at each sampling area were used to produce a composite sample which was used for laboratory analyses. Subsurface soil features were observed with an auger at each pit, and the average (continuous variables) or most common (categorical variables) value recorded. Note that soil sampling was done using an areal support – an area of . However, the shape and exact area of the sampling units are unknown, and georeferencing took place at point support. Thus, the use of this subset requires to make the assumption that it was obtained using a point sampling support. The possible negative consequences of this assumption have not been explored till now.

Georeferencing was done in the field using a GNSS signal receiver with a horizontal precision of more than positioned approximately at the centre of the sampling area. The horizontal positional error was larger than when the GNSS signal was affected by vegetation biomass, terrain features and satellite configuration. In this cases the observation locations were georeferenced in the office using spatial resolution Google Earth imagery of the years of 2008 and 2009. The positional horizontal error of Google Earth imagery is .

Every observation point was identified with a number in increasing order, following the order in which the observations were made (–). The \(n = 340\) soil observations were obtained after  field campaigns. This number of observations yielded a density of about  observations per .