When=1,
. (11)
When is large(), is small, sois small, so. Setting it to 1 will increase . When is small, , setting it to 1 will decrease . So the effect is increasing large probability, decreasing small probability, then the network’s robustness increases. It’s important to note thatmaybe equal to 1 or 0, somaybe equal to 0 or maybe , so we need the function to be
. (12)
Because the generation effect of the network relies on the accuracy of abstract network’s classification and convolution neural network’s accuracy is 1%-2% larger than fully connected network, so we can use convolution neural network to replace the fully connected network. Most deep learning frameworks all have the transpose of convolution function.
The Test and the Analysis
The Test
The experiment is done by mnist dataset and tensorflow framework. The abstract network has 3 convolution layers and 2 fully connected layers, no maxpooling and padding. Because they will cause generation loss. The concrete network is the inverse function of abstract network. When training the abstract network, loss function is sparse_categorical_crossentropy[3], optimizer is adam[3], metrics is accuracy[3]. Because the number of layers is small, so the accuracy of the abstract network on test data is 98.3%-99%. This will affect the generation effect, but slightly. When training the concrete network, loss function is mean squared error, optimizer is adam, metrics are mean absolute error and cosine similarity, label is input. The test result on test data is shown on Fig2. We can see that the network generates inputs well although it never know the test data before.
The Analysis
From Fig. 2(a), we can see that the results of these classification are all right, the generation is ok although it is a little blurry. From most of generation outputs, we can see that the similarity of the generation and input is not very high and the generation tend to be the nearest type of input in training dataset. The number 3 and 0 are the best example. Let’s look at the second 9. The bottom part of 9 have two inclines, one is vertical, the other is left-leaning. It tells us that the classification result of the abstract network is between the two kinds of 9, so two tails appear. We can conclude that the commonality is clear and the individuality is blurry which reflects the generalization ability of the network.
Because the label of one class is same, when we use label as input of concrete network, the output is the same. To achieve this, we add round function(which is in Fig3) between abstract network and concrete network. The result is shown in Fig. 2(b) We can see that the type of the output is the nearest to all the same class. The output can be seen as ‘the mean of the same class’ and the representative of class.