Fig. 4. Effect of Location Change
From the pictures we can see that we can generalize the rare situation through the attribute we learned. Although we just use three-primary colors, but we can generate other colors. We can generate the medium state of big and small, up and middle. Its generalization ability is good because the network is a continuous function.
Multi-input and Multi-output Test
Inspired by multi-attribute and computer input/output principle and the difference between speech recognition and image recognition can be represented by function, I design a network to process speech recognition and image recognition simultaneously. It is shown in Fig. 5. The multi-attribute network is one input, several outputs, but this network is several inputs and several outputs. The dataset is mnist and some downloaded speech data. The amount of speech data is small because I can’t find larger dataset. The two task are all classification task, I change them to regression. I concatenate the mnist and the speech data as the input, the labels of the two datasets are also concatenated. For simple, I choose the fully connected network which has 5 layers for my task. The first four layers have 60 neuron units each, the last layer has 20 because there are twenty classes(mnist:10 classes, speech dataset:10 classes). The label is one hot encoding. When training it, loss function is mean squared error, optimizer is adam, metrics is accuracy(calculate the two type tasks separately). Because the shape of mnist image is 2828 and the shape of audio is 2011, and there are 20477 audio samples which split into train data and test data by the ratio of 0.7. So the shape of train data is (14333,1004) and the shape of test data is (6144,1004). The shape of train label is (14333,20) and the shape of test label is (6144,20). Through test, the accuracy of image regression on test dataset is 86.5% and the accuracy of speech regression on test dataset is 79.8%. The accuracy will be higher when the dataset is larger. Because the samples of mnist are larger than the audio data, so we can input images only and make the audio input to -1. We can add one output which indicates no input of audio when it is 1 and there is input of audio when it is 0. By adding these changes, the accuracy of image regression is also 86.6%. So the universal network can also process single input. It is important to note that the accuracy of image and audio on test data is 10% less than the accuracy of image and audio on test data separately. Through test, we can conclude that one network can do two(I don’t test more than two) tasks simultaneously, the parameters are shared by the two tasks, so it is parallel processing. I guess it can do more than two tasks simultaneously, so it is universal neural network. When it does several tasks simultaneously, the understanding ability increases. Because we can make decision base on multi-dimension information, so it is more intelligent.