Authorea

Joao Paulo Papa edited Introduction.tex over 8 years ago

Commit id: 83e333336d53e355235b0282e03ef87c2a862ead

deletions | additions

Roughly speaking, given a problem with $k$ clusters, each agent (e.g., harmonies, particles, bats or fireflies) encodes a possible solution in $\Re^{k*n}$, as depicted in Figure~\ref{f.problem_representation}. Therefore, after placing all agents with random positions, the $k$-means algorithm is executed once for each agent using that positions as the starting point. Soon after, the ADDC is computed over the final clustered dataset to be used as the fitness function for each meta-heuristic technique, which outputs the optimum/near-optimum possible solution with the best starting points for $k$-means. [FIGURE 1] The work proposed by Papa et al.~\cite{Papa_2015} goes beyond that point by combining different solutions obtained through distinct meta-heuristic techniques. Since each technique has its own weaknesses, the idea is to explore a higher level of optimization in order to improve each individual solution by means of the combination of all obtained solutions so far. Although such step can be performed by any optimization technique, we opted to employ Genetic Programming (GP) for two main reasons: (i) we did not use any meta-heuristic technique that has been employed during the first step of optimization in order to avoid biases, and (ii) GP provides a more powerful combination process as a hyper-heuristic technique, since it can apply a number of arithmetic operations for that purpose, instead of using movement-based equations to place agents from one position to another. Genetic Programming~\cite{Koza_1992} is an evolutionary-based optimization algorithm that models each solution as an individual, which is usually represented as a tree composed of ``function"\ and ``terminal"\ nodes. The function nodes encode the arithmetic operators used over the terminal nodes in order to evaluate the trees (Figure~\ref{f.gp_tree}), (Figure~\ref{f.gp_tree}a), and the terminal nodes represent constant values. At each iteration, specific operations over the current population are performed to design the next generation of individuals, being the most used ones: (i) mutation, (ii) crossover and (iii) reproduction. Mutation and crossover aim at allowing a greater variability to the population of individuals, while reproduction tries to maintain the best ones to the next generation. In short, mutation operations change each individual without considering others, i.e., given a mutation point, we can simply generate a new random subtree at that point, while crossover switch branches between two distinct trees. [FIGURE 2 a,b] The work by Papa et al.~\cite{Papa_2015} employs the best result of each meta-heuristic (i.e., HS, IHS, GHS, NGHS and SGHS) technique to compose the set of terminal nodes. This means GP can use any technique from that set for combination purposes, and therefore can decide which one will compose the best individual right after the evolutionary-oriented optimization process. \section{Methodology} \label{s.material}