# Introduction

In the last decades, advances in remote sensing image acquisition systems have moved in lockstep with the need for applications that make use of such sort of data. Land-use/cover recognition (Pisani 2014, Bagan 2010, Capucim 2015, Banerjee 2015), target recognition (Dong 2015, Martorella 2011, Du 2011), image classification (Li 2015, Voisin 2014) and band selection in hyper-spectral images (Yang 2011, Yuan 2015) are among the most pursued applications, just to name a few. The large amount of high-resolution content available by satellites also highlights the bottleneck that takes place when labeling data. Such process is skilled-dependent, and it might be very prone to errors when dealing with manual annotation. Such shortcomings have fostered even more the research on semi-supervised and unsupervised techniques, which may work well in some remote sensing-oriented applications.

Considered a hallmark in the pattern recognition research field, the so-called $$k$$-means algorithm (MacQueen 1967) has been consistently enhanced in the last decades. Given it does not make use of labeled data and it has a simple formulation, $$k$$-means is still one of the most used classification techniques up to date. Roughly speaking, given a set of feature vectors (samples) extracted from a dataset, $$k$$-means tries to minimize the distance from each sample to its closest center (mean). Such process ends up clustering the data after some steps, being two samples from the same cluster more “connected" to its centroid than to any other in the dataset. As its main drawbacks, we can shed light the number of clusters required as an input, and the leaning of the naïve algorithm to get trapped from local optima, i.e., centroids that do not represent well the clusters.

The aforementioned scenario turns $$k$$-means algorithm more prone to be addressed by means of optimization techniques, mainly those based on nature- and evolutionary-oriented mechanisms. Actually, not only $$k$$-means but a number of other techniques have used the framework of meta-heuristic-based optimization to cope with problems that somehow can be modeled as a task of finding decision variables that maximize/minimize some certain fitness function. Chen et al. (Chen 2009), for instance, employed Genetic Algorithms (GAs) and neural networks to classify both land-use and landslide zones in eastern Taiwan, being the former used to compute the set of weights that combine some landslide incidence factors. Nakamura et al. (Nakamura 2014) dealt with the task of band selection in hyper-spectral imagery through nature-inspired techniques. Truly speaking, the idea is to model the problem of finding the most important bands as a feature selection task. Without loss of generality, both problems are the very same one when the brightness of each pixel is used to represent it.

Very recently, Goel et al. (Goel 2015) tackled the problem of remote sensing image classification using some nature-inspired techniques, say that Cuckoo Search and Artificial Bee Colony. Senthilnatha et al. (Senthilnatha 2014) used GAs, Particle Swarm Optimization and Firefly Algorithm for the automatic image registering of multi-temporal remote sensing data. In short, the idea is to perform image registration while minimizing some criterion function (Mutual Information in that case). The theory about Artificial Immune Systems has been used to classify remote sensing data as well (Kheddam 2014), in which a multi-band image covering the area of northeastern part of Algiers was used for validation purposes.

Coming back to the $$k$$-means technique, Chandran and Nazeer (Chandran 2011) proposed to solve the problem of minimizing the distance from each dataset sample to its nearest centroid using the Harmony Search, which is a meta-heuristic optimization technique based on the way musicians create songs in order to obtain the best harmony. Forsati et al. (Forsati 2008) employed a similar approach, but in the context of web page clustering, while Lin et al. (Lin 2012) proposed a hybrid approach concerning the task of $$k$$-means clustering and Particle Swarm Optimization. Later on, Kuo et al. (Kuo 2013) integrated $$k$$-means and Artificial Immune Systems for dataset clustering, and Saida et al. (Saida 2014) employed the Cuckoo Search to optimize $$k$$-means aiming at classifying documents. Finally, a comprehensive study about the application of nature-inspired techniques to boost $$k$$-means was presented by Fong et al. (Fong 2014).

Despite all aforementioned works aimed at enhancing $$k$$-means using meta-heuristic techniques, there is a little concern about the application of hyper-heuristic techniques for that purpose, as well as only a very few works attempted at dealing with $$k$$-means optimization in the context of land-use/cover classification. The term “hyper-heuristics" was coined to address new algorithms designed to solve general problems by combining known meta-heuristics, in such a way each technique may compensate the weaknesses of others (Ross 2005). In such context, Papa et al. (Papa 2015) were one of the first that focused on the application of hyper-heuristics to optimize $$k$$-means, being the proposed approach validated in the background of both satellite- and radar-based land-cover classification1. That work employed Genetic Programming to combine five variations of the Harmony Search algorithm with promising results. In this paper, we extend the work by Papa et al. (Papa 2015) with a deeper experimental analysis, in which Particle Swarm Optimization, Bat Algorithm and Firefly Algorithm are also considered together with Harmony Search and its variants for combination purposes through Genetic Programming. The results obtained in this paper outperformed the previous work by Papa et al. (Papa 2015), thus emphasizing the benefits of the hyper-heuristic-based framework. To the best of our knowledge, that is the first time such approaches are combined with each other aiming at optimizing $$k$$-means.

The remainder of this paper is organized as follows. Section \ref{s.theoretical} presents the theoretical background regarding the meta-heuristic optimization techniques addressed in this work. Sections \ref{s.proposed} and \ref{s.material} present the proposed approach and the experimental setup, respectively. Section \ref{s.experiments} discusses the experiments, and Section \ref{s.conclusions} states conclusions and future works.

# Theoretical Background

\label{s.theoretical}

In this section, we briefly present the theoretical background regarding the meta-heuristic techniques employed in this paper, as well as some basis related to optimization-based problems.

Let $${\cal S}=\{\textbf{x}_1,\textbf{x}_2,\ldots,\textbf{x}_m\}$$ be a search space, where each possible solution $$\textbf{x}_i\in\Re^n$$ is composed of $$n$$ decision variables, and $$x_{i,j}$$ stands for the $$j^{th}$$ decision variable of agent $$i$$. Additionally, let $$f:{\cal S}\rightarrow\Re$$ be a function to be minimized/maximized2. Roughly speaking, the main idea of any optimization problem is to solve the following equation:

\[\label{e.minimization} \textbf{x}^\ast = \displaystyle \min_{\textbf{x}\in{\cal