A Hyper-Heuristic Approach for Unsupervised Land-Cover Classification

Introduction

In the last decades, advances in remote sensing image acquisition systems have moved in lockstep with the need for applications that make use of such sort of data. Land-use/cover recognition (Pisani 2014, Bagan 2010, Capucim 2015, Banerjee 2015), target recognition (Dong 2015, Martorella 2011, Du 2011), image classification (Li 2015, Voisin 2014) and band selection in hyper-spectral images (Yang 2011, Yuan 2015) are among the most pursued applications, just to name a few. The large amount of high-resolution content available by satellites also highlights the bottleneck that takes place when labeling data. Such process is skilled-dependent, and it might be very prone to errors when dealing with manual annotation. Such shortcomings have fostered even more the research on semi-supervised and unsupervised techniques, which may work well in some remote sensing-oriented applications.

Considered a hallmark in the pattern recognition research field, the so-called \(k\)-means algorithm (MacQueen 1967) has been consistently enhanced in the last decades. Given it does not make use of labeled data and it has a simple formulation, \(k\)-means is still one of the most used classification techniques up to date. Roughly speaking, given a set of feature vectors (samples) extracted from a dataset, \(k\)-means tries to minimize the distance from each sample to its closest center (mean). Such process ends up clustering the data after some steps, being two samples from the same cluster more “

The aforementioned scenario turns \(k\)-means algorithm more prone to be addressed by means of optimization techniques, mainly those based on nature- and evolutionary-oriented mechanisms. Actually, not only \(k\)-means but a number of other techniques have used the framework of meta-heuristic-based optimization to cope with problems that somehow can be modeled as a task of finding decision variables that maximize/minimize some certain fitness function. Chen et al. (Chen 2009), for instance, employed Genetic Algorithms (GAs) and neural networks to classify both land-use and landslide zones in eastern Taiwan, being the former used to compute the set of weights that combine some landslide incidence factors. Nakamura et al. (Nakamura 2014) dealt with the task of band selection in hyper-spectral imagery through nature-inspired techniques. Truly speaking, the idea is to model the problem of finding the most important bands as a feature selection task. Without loss of generality, both problems are the very same one when the brightness of each pixel is used to represent it.

Very recently, Goel et al. (Goel 2015) tackled the problem of remote sensing image classification using some nature-inspired techniques, say that Cuckoo Search and Artificial Bee Colony. Senthilnatha et al. (Senthilnatha 2014) used GAs, Particle Swarm Optimization and Firefly Algorithm for the automatic image registering of multi-temporal remote sensing data. In short, the idea is to perform image registration while minimizing some criterion function (Mutual Information in that case). The theory about Artificial Immune Systems has been used to classify remote sensing data as well (Kheddam 2014), in which a multi-band image covering the area of northeastern part of Algiers was used for validation purposes.

Coming back to the \(k\)-means technique, Chandran and Nazeer (Chandran 2011) proposed to solve the problem of minimizing the distance from each dataset sample to its nearest centroid using the Harmony Search, which is a meta-heuristic optimization technique based on the way musicians create songs in order to obtain the best harmony. Forsati et al. (Forsati 2008) employed a similar approach, but in the context of web page clustering, while Lin et al. (Lin 2012) proposed a hybrid approach concerning the task of \(k\)-means clustering and Particle Swarm Optimization. Later on, Kuo et al. (Kuo 2013) integrated \(k\)-means and Artificial Immune Systems for dataset clustering, and Saida et al. (Saida 2014) employed the Cuckoo Search to optimize \(k\)-means aiming at classifying documents. Finally, a comprehensive study about the application of nature-inspired techniques to boost \(k\)-means was presented by Fong et al. (Fong 2014).

Despite all aforementioned works aimed at enhancing \(k\)-means using meta-heuristic techniques, there is a little concern about the application of hyper-heuristic techniques for that purpose, as well as only a very few works attempted at dealing with \(k\)-means optimization in the context of land-use/cover classification. The term “

The remainder of this paper is organized as follows. Section \ref{s.theoretical} presents the theoretical background regarding the meta-heuristic optimization techniques addressed in this work. Sections \ref{s.proposed} and \ref{s.material} present the proposed approach and the experimental setup, respectively. Section \ref{s.experiments} discusses the experiments, and Section \ref{s.conclusions} states conclusions and future works.

Theoretical Background

\label{s.theoretical}

In this section, we briefly present the theoretical background regarding the meta-heuristic techniques employed in this paper, as well as some basis related to optimization-based problems.

Let \({\cal S}=\{\textbf{x}_1,\textbf{x}_2,\ldots,\textbf{x}_m\}\) be a search space, where each possible solution \(\textbf{x}_i\in\Re^n\) is composed of \(n\) decision variables, and \(x_{i,j}\) stands for the \(j^{th}\) decision variable of agent \(i\). Additionally, let \(f:{\cal S}\rightarrow\Re\) be a function to be minimized/maximized1. Roughly speaking, the main idea of any optimization problem is to solve the following equation:

\[\label{e.minimization} \textbf{x}^\ast = \displaystyle \min_{\textbf{x}\in{\cal S}}f(\textbf{x}),\]

where \(\textbf{x}^\ast\) stands for the best solution so far. Without loss of generality, the optimization techniques differ on the way they attempt at solving the above equation. The same occurs when working with meta-heuristics, since each one is based on a different social mechanism or living being. Since the terminology among techniques might be quite different but with similar purposes, we generalize each possible solution to the name “

Harmony Search

\label{ss.hs}

Harmony Search (HS) is a meta-heuristic algorithm inspired in the improvisation process of music players, since they often improvise the pitches of their instruments searching for a perfect state of harmony (Music-Inspired Harmon...). The main idea is to use a similar process to the one adopted by musicians when creating new songs, where each possible solution is modeled as a harmony (agent), and each musician corresponds to one decision variable.

In the context of HS, our search space \({\cal S}\) is called “

The memory consideration step concerns with modeling the process of creating songs, in which the musician can use his/her memories of good musical notes to create a new song. This process is modeled by the Harmony Memory Considering Rate (\(HMCR\)) parameter, which is the probability of choosing one value from the historic values stored in the harmony memory, being \((1-HMCR)\) the probability of randomly choosing one feasible value, as follows:

\[\begin{aligned} \label{e.hmcr} \hat{x}_{m+1,j} & = & \left\{ \begin{array}{ll} x_{A,j} & \mbox{ with probability $HMCR$} \\ \theta \in \bm{\Phi}_j & \mbox{ with probability (1-$HMCR$),} \end{array}\right.\end{aligned}\]

where \(A\sim {\cal U}(1,2,\ldots,m)\), and \(\bm{\Phi}=\{\bm{\Phi}_1,\bm{\Phi}_2,\ldots,\bm{\Phi}_m\}\) stands for the set of feasible values for each decision variable.

Further, every component \(j\) of the new harmony vector \(\textbf{x}_{m+1}\) is examined to determine whether it should be pitch-adjusted or not, being such step controlled by the Pitch Adjusting Rate (PAR) variable, as follows:

\[\begin{aligned} \label{e.par} x_{m+1,j} & = & \left\{ \begin{array}{ll} x_{m+1,j}\pm \varphi_j \varrho & \mbox{ with probability $PAR$} \\ x_{m+1,j} & \mbox{ with probability (1-$PAR$).} \end{array}\right.\end{aligned}\]

The pitch adjustment is often used to improve solutions and to escape from local optima. This mechanism concerns shifting the neighboring values of some decision variable in the harmony, where \(\varrho\) is an arbitrary distance bandwidth, and \(\varphi_j\sim {\cal U}(0,1)\).

Improved Harmony Search

\label{ss.ihs}

The Improved Harmony Search (IHS) (Mahdavi 2007) differs from traditional HS by updating the \(PAR\) and \(\varrho\) values dynamically. The PAR updating formulation at time step \(t\) is given by:

\[\label{e.par_ihs} PAR^t = PAR_{min}+\frac{PAR_{max}-PAR_{min}}{T}t,\]

where \(T\) stands for the number of iterations, and \(PAR_{min}\) and \(PAR_{max}\) denote the minimum and maximum \(PAR\) values, respectively. In regard to the bandwidth value at time step \(t\), it is computed as follows:

\[\label{e.bandwidth_ihs} \varrho^t=\varrho_{max}\exp{\frac{\ln(\varrho_{min}/\varrho_{max})}{T}t},\]

where \(\varrho_{min}\) and \(\varrho_{max}\) stand for the minimum and maximum values of \(\varrho\), respectively.

Global-best Harmony Search

\label{ss.ghs}

The Global-best Harmony Search (GHS) (Omran 2008) employs the same modification proposed by IHS with respect to dynamic \(PAR\) values. However, it does not employ the concept of bandwidth, being Equation \ref{e.par} replaced by: \[\label{e.par_ghs} x_{m+1,j} = x_{best,z},\] where \(z\sim U(1,2,\ldots,n)\), and \(best\) stands for the index of the best harmony.

Novel Global Harmony Search

\label{ss.nghs}

The Novel Global Harmony Search (NGHS) (Zou 2010) differs from traditional HS in three aspects: (i) the \(HMCR\) and \(PAR\) parameters are excluded, and a mutation probability \(p_m\) is then used; (ii) the NGHS always replaces the worst harmony with the new one, and (iii) the improvisation footsteps are also modified, as follows:

\[R=2x_{best,j}-x_{worst,j},\]

\[x_{m+1,j}=x_{worst,j}+\mu_j(R-x_{worst,j}),\]

where \(worst\) stands for the index of the worst harmony, and \(\mu_j\sim U(0,1)\). Further, another modification with respect to the mutation probability is performed in the new harmony:

\[\begin{aligned} x_{m+1,j} & = & \left\{ \begin{array}{ll} L_j+ \varpi_j(U_j-L_j) & \mbox{ if $\kappa_j\leq p_m$} \\ x_{m+1,j} & \mbox{ otherwise,} \end{array}\right.\end{aligned}\]

where \(\kappa_j,\varpi_j\sim U(0,1)\), and \(U_j\) and \(L_j\) stand for the upper and lower bounds of decision variable \(j\), respectively.

Self-adaptive Global best Harmony Search

\label{ss.sghs}

The SGHS algorithm (Pan 2010) is a modification of GHS that employs a new improvisation scheme and self-adaptive parameters. First of all, Equation \ref{e.par_ghs} is rewritten as follows:

\[\label{e.par_sghs} x_{m+1,j} = x_{best,j},\]

and Equation \ref{e.hmcr} can be replaced by:

\[\begin{aligned} \label{e.hmcr_sghs} x_{m+1,j} & = & \left\{ \begin{array}{ll} x_{A,j}\pm \varphi_j \varrho & \mbox{ with probability $HMCR$} \\ \theta \in \bm{\Phi}_j & \mbox{ with probability (1-$HMCR$).} \end{array}\right.\end{aligned}\]

The main difference among SGHS and the other variants concerns with the computation of \(HMCR\) and \(PAR\) values, which are estimated based on the average of their recorded values after each \(LP\) (learning period) iterations. Every time a new agent is better than the worst one, the \(HMCR\) and \(PAR\) values are then recorded to be used in the estimation of their new values, which follow a Gaussian distribution, i.e., \(HMCR\sim{\cal N}(HMCR_m,\sigma_{HMCR})\) and \(PAR\sim{\cal N}(PAR_m,\sigma_{PAR})\), where \(HMCR_m\) and \(PAR_m\) stand for the mean values of \(HMCR\) and \(PAR\) parameters, respectively.

Particle Swarm Optimization

\label{s.pso}

Particle Swarm Optimization (PSO) is an algorithm modeled on swarm intelligence dynamics that finds a solution in a search space based on social behavior (Kennedy 2001). Each possible solution (agent) is modeled as a particle in the swarm that imitates its neighborhood based on the values of the fitness function found so far.

Each particle has a memory that stores its best solution, as well as the best solution of the entire swarm. Thus, taking this information into account, each particle has the ability to imitate others that obtain the best local and global maxima. This process simulates the social interaction between humans looking for the same objective, or bird flocks looking for food, for instance. This socio-cognitive mechanism can be summarized into three main principles: (i) evaluation, (ii) comparison, and (iii) imitation. Each particle can evaluate others in its neighborhood through some fitness function, can compare it with its own value and, finally, can decide whether it is a good choice to imitate them. PSO makes use of both velocity and position terms to perform optimization at time step \(t\), as follows:

\[\label{e.velocity_pso} \textbf{v}^{t+1}_i=w\textbf{v}^t_i+c_1\delta_1(\hat{\textbf{x}}_i-\textbf{x}_i)+c_2\delta_2(\hat{\textbf{g}}-\textbf{x}_i),\]

where \(\textbf{v}^t_i\) stands for the velocity of agent (particle) \(i\) at iteration \(t\), \(w\) is the inertia weight, \(\delta_1,\delta_2\sim{\cal U}(0,1)\), and \(c_1\) and \(c_2\) are ad-hoc variables. Next, the new position of each agent \(i\) is updated as follows:

\[\label{e.position_