Authorea

Roland Szabo edited methodology.tex almost 10 years ago

Commit id: 47498bac02945afda3933cf5ed25611c2d14aa16

deletions | additions

Random forests are a popular algorithm in many machine learning competitions, because they are fast, they don't have many parameters to tune, yet still produce good predictions. Among their weaknesess is the fact that they can easily overfit a noisy dataset. \subsubsection{Linear SVM} \subsubsection{Support vector machines} Support vector machines\cite{Cortes_1995} are discriminative classifiers formally defined by high-dimensional hyperplanes, which are used to distinguish between the classes to which data points belong. The hyperplane defined by an SVM maximizes the margin to the data points used in training, hoping that this leads to a better generalization of the classifier.

The other parameters of the random forest were chosen by cross-validation: the number of trees and the number of features to consider when randomly sampling from the feature space. \subsubsection{The support vector machine model} The SVM was used for the character recognition problem. A The performance of both linear kernel and radial basis function kernels was used, so there was no projecting of data into a higher dimensional space. evaluated. The regularization parameter of thelinear SVM was determined using cross-validation.