Roland Szabo edited abstract.tex  about 10 years ago

Commit id: 5f0f289242125737190619d2fd27c3956e309b2d

deletions | additions      

       

10 randuri This paper presents the problem of doing optical character recognition on receipt images and then extracting structured information from the obtained text, using machine learning algorithms. Tools that have not been trained specifically for this kind of images do not handle them well usually, because receipts have custom fonts and, due to size constraints, many letters are close to each other. In this paper we adapt existing methods for doing OCR, so that they give the best results for retrieving the most accurate information from receipts. Text is found in images using the Stroke Width Transform algorithm, then lines are segmented into characters using Random Forests and finally they are classified using Linear Support Vector Machines. The results obtained by applying these algorithms are discussed and analyzed.