Authorea

Roland Szabo edited discussion.tex almost 10 years ago

Commit id: c85dfdf0c70ef62d47a7d8bfc2ff91a5da4a1900

deletions | additions

\section{Discussion and comparison Comparison to related work} Related Work} \label{sec:disc} This section analyzes our approaches and compares them to similar existing work. \subsection{Analysis of the proposed approach} Proposed Approach} For the character recognition problem, using a RBF kernel SVM resulted in a $ 91.018\% \pm 0.126 $ error rate in the best case, using a value of 100 for the regularization rate, while using a linear kernel yielded $ 90.490\% \pm 0.097 $ as its best result, for the value of 1 for the regularization rate. Using a RBF kernel gives a slightly better results. One explanation for this would be that the data is already high dimensional (400 dimensions) and the data points lie in sufficiently different parts so that the separating hyperplanes work effectively, without needing a projection into the infinite dimensional space offered by the RBF kernel.

\text{specificity} = \frac{\text{true negatives}}{\text{all negatives}} = \frac{4556}{4556+255} = 94.70 \% \] \subsection{Comparison to related work} Related Work} If we compare the results of the character recognition problem with the ones obtained on the MNIST dataset\cite{lecun1998mnist}, we see that our error rates are much higher: around 9\% versus 1.4\% (obtained by using a Gaussian SVM on MNIST). There are two possible explanations for this. One is that MNIST has only ten classes (corresponding to the ten digits), so it is much easier for a classifier to assign the correct class to a data point, compared to our dataset, which has 74 distinct classes, of which many are similar (the digit 0 and the letters o and O, the digit 1 and the letter l). Another factor that makes MNIST easier to solve is the fact that it has 70000 datapoints in total, while our dataset contains only 7045 data points. This almost ten fold difference has a great contribution to the ability of the classifier to generalize. Even on the MNIST dataset, adding more data is beneficial, as show by Ciresan et al.\cite{Cire_an_2010}, who obtained one of the best results on MNIST by augmenting the dataset with artificial data generated by transforming the existing digits. The results obtained in \cite{kahraman2003license} and \cite{Franc_2005} for the character segmentation problem are a bit better than our results: 94.5\% and 96.7\% accuracy, compared to our 91.9\%. The difference between the two problems is that in their papers the number of characters in a licence plate is known and fixed, while lines in a receipt have varying lenghts. This prior knowledge used by them can explain the difference in results. The paper by Janssen\cite{janssen2012receipts2go} deals with OCR on receipts, but they don't train the OCR engine for receipts, they just present a new approach for image normalization. \subsection{Application for the OCR engine} Engine} The main reason for developing this OCR engine was to use it an application, called ReceiptBudget, which would enable users to take a photo of a receipt, the program would then extract the relevant information (date, items, store, total) from the photo, store it in a database, and then user could view an interactive dashboard of his expenses. The motivation for using this method of data entry is that it is much faster than the alternative, which is much slower, tedious and users are prone to forgetting to do it.