Authorea

Joao Paulo Papa edited Introduction.tex over 8 years ago

Commit id: 525f7943a2cadc2e1c72a19b4ba3547b6d18bcab

deletions | additions

In the last decades, advances in remote sensing image acquisition systems have moved in lockstep with the need for applications that make use of such sort of data. Land-use classification~\cite{Pisani_2014,Bagan_2010,Capucim_2015}, target recognition~\cite{Dong_2015,Martorella_2011,Du_2011} and band selection in hyper-spectral images~\cite{He_Yang_2011,Yuan_Yuan_2015} are among the most pursued applications, just to name a few. The large amount of high-resolution content available by satellites also highlights the bottleneck that takes place when labeling data. Such process is skilled-dependent, and it might be very prone to errors when dealing with manual annotation. Such shortcomings have fostered even more the research on semi-supervised and unsupervised techniques, which may work well in some remote sensing-oriented applications. Considered a hallmark in the pattern recognition research field, the so-called $k$-means algorithm~\cite{MacQueen_1967} has been consistently enhanced in the last decades. Given it does not make use of labeled data and it has a simple formulation, $k$-means is still one of the most used classification techniques up to date. Roughly speaking, given a set of feature vectors (samples) extracted from a dataset, $k$-means tries to minimize the distance from each sample to its closest center (mean). Such process ends up clustering the data after some steps, being two samples from the same cluster more ``connected"\ "connected"\ to its centroid than to any other in the dataset. As its main drawbacks, we can shed light the number of clusters required as an input, and the leaning of the na\"ive algorithm to get trapped from local optima, i.e., centroids that do not represent well the clusters. %using nature-inspired techniques~\cite{Nakamura_2014}