Figure 10: Image of extraction column (left hand side), example pictures for optimal operating point (upper right-hand side) and flooding (lower right-hand side) with the area of interest (where the two states are distinguishable) indicated by the white box.
Images are taken under various different illuminations, light from the right-, the left-hand side, both sides, or daylight only. A Panasonic DMC-FZ72 camera is used with an image resolution of about 0,13 mm/pixel. An image preprocessing routine follows, where the desired image section is cut out as indicated in the right-hand side images in Figure 10.
The images are labeled according to the operating state “normal operating state” or “flooding”. More than 1000 images per class are fed as training data to the neural net. As convolutional neural network CNN resnet18 is used and retrained for this purpose. The core idea of ResNet is introducing a so-called “identity shortcut connection” that skips one or more layers. Since the shortcut connection is learning only the residual, the whole module is called residual module. The shortcut connection’s skipping of certain layers speeds up the training process of the net. Resnet-18 consists of 18 convolution blocks, which each consist of several different layers.
After seven training epochs the neural net achieves an accuracy of 99.3 %. For training purpose, the batch size was chosen to 4 with stochastic gradient descent and momentum (SGDM) as solver and with an initial learning rate of 0.001.
To check whether the net provides a reasonable performance, a confusion matrix is created. Here, the predicted class of the network is compared to the true class that is given to the image. If the predicted and the true class are the same, the network can make correct predictions. For validation of the trained net a set of 252 test images with 126 for each state, not used for the training of the neural net beforehand, is used. The obtained accuracy is 99.7% with a single misclassified image
During the investigations, the following question came up: what if the net can predict the class correctly, but is based on unreasonable sections within the image? To exclude this error source from training a network, a class activation map (CAM) is introduced. Its purpose is to visualize, within which area of the image the neural net deems most important to base on its class prediction decision. This CAM is constructed in Matlab by using the “activations()” function and plotting it on top of the analyzed image, see Figure 11.