Authorea

Joao Paulo Papa edited Introduction.tex over 8 years ago

Commit id: da888dcadf3e3f1484c85e4ab97c877ee05abe56

deletions | additions

Considered a hallmark in the pattern recognition research field, the so-called $k$-means algorithm~\cite{MacQueen_1967} has been consistently enhanced in the last decades. Given it does not make use of labeled data and it has a simple formulation, $k$-means is still one of the most used classification techniques up to date. Roughly speaking, given a set of feature vectors (samples) extracted from a dataset, $k$-means tries to minimize the distance from each sample to its closest center (mean). Such process ends up clustering the data after some steps, being two samples from the same cluster more "connected"\ to its centroid than to any other in the dataset. As its main drawbacks, we can shed light the number of clusters required as an input, and the leaning of the na\"ive algorithm to get trapped from local optima, i.e., centroids that do not represent well the clusters. The aforementioned scenario turns $k$-means algorithm more prone to be addressed by means of optimization techniques, mainly those based on nature nature- and evolutionary evolutionary-oriented mechanisms. Actually, not only $k$-means but a number of other techniques have used the framework of meta-heuristic-based optimization to cope with problems that somehow can be modeled as a task of finding variables that maximize/minimize some certain fitness function. \section*{Acknowledgments} The authors are grateful to FAPESP grants \#2013/20387-7 and \#2014/16250-9, as well as CNPq grants \#470571/2013-6 and \#306166/2014-3.