Predictor Variables Selection
\label{subsec:predictorvariablesselection}
The total available dataset which can be used for machine learning, can be modified to minimize the prediction error. Various techniques can be used for this purpose, such as removing any unnecessary information, but the most commonly used technique is Feature Selection – a tool for building a set of code metric types from all data collected by the Eclipse Metrics 2 tool (Table \ref{tab:eclipse_metrics}), which results in the smallest prediction error. During the Feature Selection process, when the metric types set is relatively small it can be based on different exhaustive search algorithms, covering all possible combinations of the available metric kinds. For each combination indicated by the algorithm, the process of prediction model creation and its validation is conducted, and then the prediction error is determined. The metrics sub-set that results in the smallest prediction error, is then selected.
When there are a large number of available types of metrics, usage of an exhaustive search algorithm is very expensive. Pendharkar
(Pendharkar 2010) estimates that when an algorithm of such a type would be applied to a data set consisting of 21 different metric types, it would take approximately 120 years of calculation by a 900 MHz RISC processor based computer to complete the Feature Selection process.