1.0 Introduction
Automatic recognition of hand written digits involves the use of image processing, computer vision, and, or, artificial intelligence translation from a digit or string of digits written by hand, to a set of characters accurately represented in memory by a computer without explicit human guidance. This has numerous applications, for example, automatic handling of cheque notes in the banking industry, translation of old handwritten texts to digital forms, handwriting input modes in smartphones, fraud detection, and so on. In this work, we present two methods: (1) A softmax regression (SR) model, and, (2) A custom KNet convolutional neural network (CNN) model.
1.1 Handwritten Digits Dataset (MNIST)
The MNIST database was built by modifying the original NIST database containing digits 0 to 9 as mentioned earlier. It has 60,000 training images (the mnist tensorflow train database has 55,000 examples however), 10,000 test images, and 5,000 validation images all drawn from the same distribution. All these images are in grayscale and normalized. The center of gravity of the intensity lies at the center of the image. Each image is 28 x 28 pixels in dimension thus each image has a total of 784 intensity values between 0 and 1 when flattened. When flattened, it can become a 784 x 1 or a 1 x 784 vector. Unfortunately, during flattening (or reshaping) the relationship of pixels with neighboring pixels is lost. Image pixels are highly correlated with their neighbors and losing this information is detrimental. The solution to this can be found in convolutional neural networks which would be discussed in Section 4.0. The major categories of machine learning techniques used in solving the recognition tasks with MNIST dataset include: Linear classifiers, k-nearest neighbors, boosted stumps, nonlinear classifiers, support vector machines(SVMs), neural nets, convolutional nets. Using tensorflow for our machine learning task, we utilize "One-Hot" encoding for the Classification Labels.