3.0    Neural Network (Softmax Regression)

In this section, we describe the first approach to solving the handwritten digit classification problem. Here, we present the Softmax Regression.
The 28 x 28 image matrix is flattened to a column vector of length 784. Each entry in the 784 rows represents a part of the training image. It is noteworthy that by flattening the input image matrix, spatial relationship information between pixels is lost. This issue is addressed by our KNet CNN model. Invariance to rotation and scaling can be dealt with by a more recent technology known as Capsule networks. In Section 3, we’ll discuss the CNN model employed. Capsule networks, on the other hand, are beyond the scope of this study.
The Arabic numerals consists of ten digits: 0,1,2,3,4,5,6,7,8,9. Each image is unpacked to a column vector \(x_j\)and multiplied by weights \(W_{i,j}\) with biases \(b_i\). This results into a tensor of logits defined by Equation 1.0.
\(z_i = \sum_j W_{i,j}x_j + b_i\)
  
 
Solving a recognition problem on this system requires techniques from multi-class classification. This is where the softmax function comes in (Equation 2.0 below).  Basically, the softmax regression (also known as the multinomial logistic regression) is a generalization of logistic regression to the case where we want to handle multiple classes.  It replaces the common sigmoid activation function used in perceptrons
\(\sigma (z_j) = \frac{\exp (z_j)}{\sum _j \exp (z_j)}\)
The Softmax function takes an array of numbers and returns a set of numbers in the range 0 and 1. These set of numbers add up to one and are used in determining the probailisitc appropriateness of a learning outcome. Figure 2.0 below shows the softmax regression graphically.