This paper briefly summarizes the basic ideas of building, solving and evaluating the RBM model, hoping to help the students who want to understand the RBM model.
Restricted Boltzmann Machine is a model based on energy representation, and its structure is a two-layer neural network, a visible layer V and a hidden layer h, there is no connection between the elements of the same layer, the layer is fully connected.
According to the network structure of the RBM, we can define the energy function in the system, which we call E. In physics, the smaller the energy of a system, the greater the probability of its stability. So with the energy function, we can define the probability of the stability of the system, and here we call P, which is the joint probability of V and H.
In an RBM, the state of one of the layers is known, and the state of the other layer is conditionally independent. The value of each unit in our common RBM is two yuan, i.e. not 0 and 1.
So with the description above, a model is built. Now given the training data, we need to use this model to fit the training data and solve the parameters in this model.
In the model, we can show the joint probability distribution of the visible and hidden layers, and then the edge probability distribution can be expressed. The edge distribution can be used to represent the distribution of the sample, and we hope that at the edge distribution of the RBM representation, the likelihood function of the sample is the largest, and this is our objective function.
In the case of the objective function determination, the simplest optimization method is the gradient method. We can get the gradient by deriving the target function from each parameter. However, after the derivation, it is found that the term with \sum_v in the gradient is equivalent to the desired term p (V). This is bad, if the visible layer has n cells, then the state of v is 2^n species, it is obviously not enumerable.
So you think of using the MCMC Gibbs method for sampling to approximate this expectation, but in Gibbs, multiple samples are needed to approximate the real distribution, and the complexity is slightly higher. Since the ultimate goal of our RBM model is to fit the distribution of training data, can we start with a sample from the beginning of Gibbs and then go through K-step to sampling, so that it converges to the target distribution faster? Based on this idea, Hinton invented the contrastive divergence algorithm in 2002, which usually results in 1 iterations, thus reducing the complexity of the RBM optimization problem.
Finally, how do we evaluate the quality of the RBM model? For example, in the classification problem, we need to use the correct rate or error rate to judge the merits and demerits of the model, then how do we judge the merits of the RBM model? You're going to think that we're just using the model to evaluate the likelihood of the training data, but to know that there is a normalization factor in the step of energy function to probability, this normalization factor is very difficult to enumerate.
So we consider the use of the refactoring error to evaluate the model, the so-called refactoring error is the sample based on the distribution of the RBM, after a Gibbs sample of the V ', we calculate V and V ' difference, the difference can be 1 norm can be 2 norm.
The above is a simple summary of the overall idea of the RBM model, hoping to be able to understand the RBM model for beginners a little help.
[Machine Learning] RBM Brief Introduction