Fig. 4. Effect of Location Change
From the pictures we can see that we
can generalize the rare situation through the attribute we learned.
Although we just use three-primary colors, but we can generate other
colors. We can generate the medium state of big and small, up and
middle. Its generalization ability is good because the network is a
continuous function.
Multi-input and Multi-output Test
Inspired by multi-attribute and computer input/output principle and the
difference between speech recognition and image recognition can be
represented by function, I design a network to process speech
recognition and image recognition simultaneously. It is shown in Fig. 5.
The multi-attribute network is one input, several outputs, but this
network is several inputs and several outputs. The dataset is mnist and
some downloaded speech data. The amount of speech data is small because
I can’t find larger dataset. The two task are all classification task, I
change them to regression. I concatenate the mnist and the speech data
as the input, the labels of the two datasets are also concatenated. For
simple, I choose the fully connected network which has 5 layers for my
task. The first four layers have 60 neuron units each, the last layer
has 20 because there are twenty classes(mnist:10 classes, speech
dataset:10 classes). The label is one hot encoding. When training it,
loss function is mean squared error, optimizer is adam, metrics is
accuracy(calculate the two type tasks separately). Because the shape of
mnist image is 2828 and the shape of audio is 2011, and there are 20477
audio samples which split into train data and test data by the ratio of
0.7. So the shape of train data is (14333,1004) and the shape of test
data is (6144,1004). The shape of train label is (14333,20) and the
shape of test label is (6144,20). Through test, the accuracy of image
regression on test dataset is 86.5% and the accuracy of speech
regression on test dataset is 79.8%. The accuracy will be higher when
the dataset is larger. Because the samples of mnist are larger than the
audio data, so we can input images only and make the audio input to -1.
We can add one output which indicates no input of audio when it is 1 and
there is input of audio when it is 0. By adding these changes, the
accuracy of image regression is also 86.6%. So the universal network
can also process single input. It is important to note that the accuracy
of image and audio on test data is 10% less than the accuracy of image
and audio on test data separately. Through test, we can conclude that
one network can do two(I don’t test more than two) tasks simultaneously,
the parameters are shared by the two tasks, so it is parallel
processing. I guess it can do more than two tasks simultaneously, so it
is universal neural network. When it does several tasks simultaneously,
the understanding ability increases. Because we can make decision base
on multi-dimension information, so it is more intelligent.