Roland Szabo edited unsupervised.tex  almost 10 years ago

Commit id: 8aeaeb1d70aacc0187215decf7883d6127cc9065

deletions | additions      

       

The energy of such a model, given the vector v (the input layer), the vector h (the hidden layer), the matrix W (the weights associated with the connections between each neuron from the input layer and the hidden one) and the vectors a and b (which represent the activations thresholds for each neuron, from the input layer and from the hidden layer) can be computed using the following formula:  $$ $  E(v,h) = -\sum_i a_i v_i - \sum_j b_j h_j -\sum_i \sum_j h_j w_{i,j} v_i $$ $  Once we have the energy for a state, its probability is given by:   $$ $  P(v,h) = \frac{1}{Z} e^{-E(v,h)} $$ $  where Z is a normalization factor.   And this is where the constraints from the RBM help us. Because the neurons from the visible layer are not connected to each other, it means that for a given value of the hidden layer neuron, the visible ones are conditionally independent of each other. Using this we can easily get the probability for some input data, given the hidden layer:  $$ $  P(v|h) = \prod_{i=1}^m P(v_i|h) $$ $  where $ P(v|h) $ is the activation probability for a single neuron:  $$ $  P(h_j=1|v) = \sigma \left(b_j + \sum_{i=1}^m w_{i,j} v_i \right) $$ $  $ \sigma = \frac{1}{1+e^{-1}} $ is the logistic function.   In a similar way we can define the probability for the hidden layer, having the visible layer fixed.  $$ $  P(h|v) = \prod_{j=1}^n P(h_j|v) $$ $  $$ $  P(v_i=1|h) = \sigma \left(a_i + \sum_{j=1}^n w_{i,j} h_j \right) $$ $  How does it help us if we know these probabilities?  Let’s presume that we know the correct values for the weights and the thresholds of an RBM and that we want to determine what items are in an image. We set the pixels of the image as the input of the RBM and we calculate the activation probabilities of the hidden layer. We can interpret these probabilities as filters learned by the RBM about the possible objects in the images.   We take the values of those probabilities and we enter them into another RBM as input data. This RBM will also give out some other probabilities for its hidden layer, and these probabilities are also filters for its own inputs. These filters will be of a higher level and more complex. We repeat this a couple of times, we stack the resulting RBMs and, on top of the last one, we add a classification layer (such as logistic regression) and we get ourselves a Deep Belief Network\cite{Hinton_2006}. Network\cite{Hinton_Teh_2006}.  The idea that started the deep learning revolution was this: you can learn layer by layer filters that get more and more complex and at the end you don’t work directly with pixels, but with high level features, that are much better indicators of what objects are there in an image.   The learning of the parameters of a RBM is done using an algorithm called “contrastive divergence”. This starts with an example from the input data, calculates the values for the hidden layer and then these values are used to simulate what input data they would produce. The weights are then adjusted with the difference between the original input data and the “dreamed” input data (with some inner products around there). This process is repeated for each example of the input data, several times, until either the error is small enough or a predetermined number of iterations has passed.