ABSTRACT:
<Sampling as a method for all fields of science and relevance....> Rarely in a sampling process do we obtain data that are made up of completely uncorrelated observations. In demographic data, often there exists some level of correlation between data points which can lead to an inflation of true information in a sample data set. <Why this may be important, especially with respect to gridded population methods.> Here, we present the application of effective sample size (ESS), which minimizes (?) the redundancy of information present in an autocorrelated dataset. In this context, we investigate different sampling strategies for positioning the samples calculated using ESS for nonparametric ensemble based models such as Random Forest. <Results show/indicate XXXX.> While the ESS has been implemented in regression models, this is first attempt to investigate its usefulness in a robust nonparametric machine learning model. The results will inform analyses with restricted settings of data availability and that deal with extremely large datasets. (NOTE: nothing here about spatial autocorrelation considerations but make that a focus in paper...)

INTRODUCTION

Gridded population data provides baseline demographic information that is commonly used for a variety of other research fields such as hazard and risk mitigation, epidemiological studies, and human-environment dynamics to name a few (CITE).  The underlying model construction of gridded population datasets range from "lightly" modeled to more complex, statistical approaches (CITE). The latter is informed by increasingly available satellite imagery and ancillary information creating an environment that allows for increasingly complex modeling processes (CITE).  <Statement about how this complexity contributes to the issue of autocorrelation in the data AND increases the computational processing time.>  The vast amount of information available in the satellite and other geospatial data sets requires an effective modeling framework with comparatively less computation cost. Here, we present an approach that leverages the application of effective sample size in the context of an ensemble-based machine learner to advance the understanding of underlying statistical sampling decisions needed for processing large-scale or large-quantity geospatial data for gridded population modeling. 
<Pargraph here on what is effective sample size, why useful as an approach, end with relevance/novelty applied to ensemble machine learning techniques -sets up next paragraph on the ensemble based approaches.>
Ensemble-based systems are one of the many branches of machine learning that have received growing attention and popularity in last decade mainly due to their many desired properties (e.g. XXXX), and a broad spectrum of applications \cite{polikar2006ensemble}. With the flexibility and large-scale, high-resolution modeling potential of machine learners, these models have gained popularity in many areas of social sciences. Specifically with respect to gridded population models, machine learning is being harnessed to take advantage of nonparamatric frameworks that are flexible in data input and yield increasingly precise estimates (CITE - Stevens et al., 2015, Azar et al., RSE 2013).