Multi-attribute Recognition,the Key
to Universal Neural Network
Jinxin Wei1, Qunying Ren2
1Vocational School of Juancheng, Juancheng 274600
China
2 Bureau of Emergency Management of Juancheng County,
Juancheng 274600 China
Abstract: To achieve the recognition
of multi-attribute of object, I
redesign the mnist dataset, change the color, size, location of the
number. Meanwhile, I change the label accordingly. The deep neural
network I use is the most common convolution neural network. Through
test, we can conclude that we can use one neural network to recognize
multi-attribute so long as the attribute difference of objects can be
represented by functions. The Concrete network(generation network) can
generate the output which the input rarely contained from the attributes
the network learned. Its generalization ability is good because the
network is a continuous function. Through one more test, We can conclude
that one neural network can do image recognition, speech recognition,
nature language processing and other things so long as
the output node and the input node
and more parameters add into the network. The network is universal so
long as the network can process different inputs. The phenomenon of
synesthesia is the result of multi-input and multi-output. Connection in
mind can realize through the universal network and sending the output
into input. Connection in mind is the key of creativity, synesthesia is
the assistant.
keywords: Computer vision,
multi-attribute, deep neural network, multi-dimension, data processing,
universal neural network, parallel processing, speech recognition,
nature language processing
Redesign of Mnist
There are many multi-label learning examples which contain many labels
and networks. I design a single label and a single network to solve the
multi-attribute problem.
Because we don’t have the dataset fit for my task, so I redesign the
mnist dataset. Because the visual attributes of object recognized by
mankind are color, size, location, shape, texture, quantity, pattern, so
we choose the color, size, location, and shape attributes as example.
Because the mnist dataset already has the shape attribute, so we only
need to add color, size, location. First, we change the color. Because
the color represented by computer is mixed by red, green and blue, so we
change the number’s color to red, green and blue. We assign the gray’s
pixel data to red channel, green channel, blue channel separately, the
other two channels are zero. The background is all 255. Secondly, we
change the size. Shrinking the image to size 1818, then put the pixel to
the white background of size 2828. When we put the pixel to up part of
the background, we change the location. Next we change the label. We use
the label form similar to the label form used in classification. Class
0-9 is one hot encoding. For example, 0100000000 is 1. Because there are
3 colors, so the red color is 100, green is 010,blue is001. The index of
color label is 10-12. Because there are two sizes which are big and
small, so the code is 01, 10 and the index is 13-14. Because there are
two locations which are up and middle, so the code is 10, 01 and the
index is 15-16.So far, we finish the dataset’s processing work. The
order of label is number, color, size, location. For example,
01000000001000101 represents big middle red 1. Why we use one hot
encoding? Because each class has one output, we can generate the
multi-output by regression, and the output is from 0 to 1 which is
similar to data normalization.
Test Design
Now we design the network. The experiment is done by tensorflow
framework. The regression network have 3 convolution layers and 2 fully
connected layers, no maxpooling and padding, activation function is
leaky relu[2]. Because they will cause generation loss. The
generation network is the inverse function [4]of regression network.
You can read my another paper named ‘A Functionally Separate
Autoencoder’, which describes the detail of generation from label to
concrete information. When training the regression network, loss
function is mean squared error, optimizer is adam[3], metrics is
accuracy[3]. I take np.argmax()[1] of prediction[:,0:10],
[:,10:13], [:,13:15], [:,15:17] separately, and the real
label is processed the same way. The regression value become index type
through this, and the accuracy can be calculated by comparing the data.
When training the generation network, loss function is mean squared
error, optimizer is adam, metrics are mean absolute error and cosine
similarity, input is multi-attribute label, output is image. So, let’s
see the result which is shown in Fig. 1.1 and Fig. 1.2.