Authorea

Roland Szabo edited methodology.tex almost 10 years ago

Commit id: aaea215c6cc56c1e1827f2bb4fcb7bfb04a2d799

deletions | additions

This section presents the background of the machine learning approaches used in the problem of OCR and then the specifics of the models applied to this problem are discussed. \subsection{Theoretical background} \subsubsection{Random forests} Random forests have been introduced by Leo Breiman and Adele Cutler\cite{breiman2001random} as an ensemble of decision trees. When using only one decision tree to make a classification, one often runs into problems with high variance or high bias. Random forests present a mechanism to avoid these problems to make more accurate models, that generalize better. When training the random forest, for each tree, n samples are taken with replacement from the training data (a bootstrap sample) \subsubsection{Linear SVM} \subsection{Model design}