When=1,
. (11)
When is large(), is small, sois
small, so. Setting it to 1 will increase . When is small, , setting it
to 1 will decrease . So the effect is increasing large
probability, decreasing small
probability, then the network’s robustness increases. It’s important to
note thatmaybe equal to 1 or 0, somaybe equal to 0 or maybe , so we need
the function to be
. (12)
Because the generation effect of the network relies on the accuracy of
abstract network’s classification and convolution neural network’s
accuracy is 1%-2% larger than fully connected network, so we can use
convolution neural network to replace the fully connected network. Most
deep learning frameworks all have the transpose of convolution function.
The Test and the Analysis
The Test
The experiment is done by mnist dataset and tensorflow framework. The
abstract network has 3 convolution layers and 2 fully connected layers,
no maxpooling and padding. Because they will cause generation loss. The
concrete network is the inverse function of abstract network. When
training the abstract network, loss function is
sparse_categorical_crossentropy[3], optimizer is
adam[3], metrics is
accuracy[3]. Because the number of layers is small, so the accuracy
of the abstract network on test data is 98.3%-99%. This will affect
the generation effect, but slightly. When training the concrete network,
loss function is mean squared error, optimizer is adam, metrics are mean
absolute error and cosine similarity, label is input. The test result on
test data is shown on Fig2. We can see that the network generates inputs
well although it never know the test data before.
The Analysis
From Fig. 2(a), we can see that the results of these classification are
all right, the generation is ok although it is a little blurry. From
most of generation outputs, we can see that the similarity of the
generation and input is not very high and the generation tend to be the
nearest type of input in training dataset. The number 3 and 0 are the
best example. Let’s look at the second 9. The bottom part of 9 have two
inclines, one is vertical, the other is left-leaning. It tells us that
the classification result of the abstract network is between the two
kinds of 9, so two tails appear. We can conclude that the commonality is
clear and the individuality is blurry which reflects the
generalization ability of the
network.
Because the label of one class is same, when we use label as input of
concrete network, the output is the same. To achieve this, we add round
function(which is in Fig3) between abstract network and concrete
network. The result is shown in Fig. 2(b) We can see that the type of
the output is the nearest to all the same class. The output can be seen
as ‘the mean of the same class’ and the representative of class.