2016-07-20 11:21:33
1 Restricted Boltzmann machine
Restricted Boltzmann machine (Restricted Boltzmann machines, RBM) [1] by the deep learning expert Hinton proposed, there are many aspects of the application, the most mature image domain image recognition and handwritten digital recognition, as a collaborative filtering algorithm for an unknown value to make predictions, For data with high-dimensional time series properties, such as the characteristics of human movement, but also to make predictions, and for the classification of document data and audio data recognition and so on.
A restricted Boltzmann machine RBM is a special Markov random Field (Markov). An RBM consists of a hidden layer consisting of random hidden elements (typically Bernoulli distributions) and a visible (observation) layer consisting of a random visible (observed) element (usually Bernoulli or Gaussian). An RBM can be represented as a binary graph model (1), where there is a connection between all visible and hidden elements, and there is no connection between the hidden unit 22 and the visible unit 22, i.e. the layer is fully connected and there is no connection within the layer (this is also the difference from the Boltzmann machine BM model , inter-layer, in-layer full connection). Each visible and hidden layer node has two states: the active state is 1, and the inactive state value is 0. The meaning of the 0 and 1 states here is to represent which nodes the model chooses to use, the nodes that are active, the nodes that are not active, are not used. The activation probability of a node is computed by the distribution function of the visible layer and the hidden layer node.
Figure 1 Restricted Boltzmann machine
In an RBM, V represents all visible cells, and H represents all hidden units. To determine the model, you can only get the model three parameters. is the weight matrix W, visible layer unit offset A, hidden layer unit offset B. Suppose an RBM has n visible units and M-hidden units, which represent the first visible unit, representing the J-element, whose parameter form is:
: Represents the weights between the first visible unit and the J hidden unit.
: Represents the bias threshold for the first visible unit.
: Represents the bias threshold for the J hidden unit.
For a given set of States (V, h) values, it is assumed that both the visible and hidden layer elements obey the Bernoulli distribution , and the energy formula of the RBM is:
The order is the parameter of the RBM model (all real numbers), and the energy function indicates that there is an energy value between the value of each visible node and the value of each hidden layer node.
After the exponential and regularization of the energy function, the visible layer node set and the hidden layer node set are respectively in a certain state (V, h) Joint probability distribution formula:
The normalization factor or the partition function , which represents the summation of all possible states (energy exponents) of the visible and hidden layer nodes collection.
The method of derivation of the likelihood function is often used to solve the parameter. It is known that the joint probability distribution, by summing all the states of the hidden Layer node collection, can get the edge distribution of the visible layer node set:
The edge distribution represents the probability that the visible layer node set is in a certain state distribution, and the edge distribution is often called the likelihood function (how to solve the model parameter in the following chapters).
because RBM The special inter-layer connection of the model, the non-connected structure within the layer, has the following important properties:
1) When the state of a visible cell is given, the active state of each hidden layer element is conditionally independent. At this point, the activation probability of the J Hidden Unit is:
Restricted Boltzmann machine and deep confidence network