Original URL:
http://blog.csdn.net/chlele0105/article/details/17251971
Dbns is a probabilistic generation model, which is relative to the traditional neural network of discriminant models and is used to establish a joint distribution between the observed data and the tags.
The DBN Training CD (contrastive divergence) is an approximate algorithm for Log-likelihood gradient and is a successful update rule for training RBMs, which is used to train RB M. At the time of training, Hinton used a layer by step unsupervised method to learn the parameters. First, the data vector x and the first layer of the hidden layer as an RBM, training the parameters of the RBM (connection x and H 1 weights, X and H 1 each node bias, etc.), and then fixed the parameters of the RBM, the H 1 as a visible vector, the H 2 as a hidden vector, training the second RBM, to get its parameters, and then fixed these parameters, training H 2 and H 3 composed of an RBM, the specific training algorithm is as follows: During the training process, the Gibbs sampling was used, that is, the visual vector value is mapped to the hidden element, and then the visual vector is reconstructed with the hidden layer element, then the visual vector Values are mapped to hidden units ... Perform this step repeatedly. The K-gibbs process is as follows:
Which is the model distribution, is training set distribution
DBN Training algorithm: DBN uses the CD algorithm to train layers by layer, get the parameters of each layer WI and CI used to initialize the DBN, and then use the supervised learning algorithm to fine-tune the parameters. Third, the classic DBN network structure classic DBN network structure is composed of a number of layers of RBM and a layer of BP composition of a deep neural network, the structure as shown in the figure below.
DBN is mainly divided into two steps in the course of training the model:
The 1th step: to train each layer of RBM network separately without supervision, to ensure that the feature vectors are kept as much as possible when they are mapped to different feature spaces;
2nd step: Set up BP network at the last layer of the DBN, receive the output eigenvector of the RBM as its input eigenvector, and train the entity relationship classifier supervised. Moreover, each layer of RBM network can only ensure that the weights within the layer are optimized for the eigenvector mapping of the layer, not the whole DBN eigenvector mapping. To the optimal, so the reverse propagation network also spreads the error message from top to bottom to each layer of RBM, fine tune the entire DBN network. The process of the RBM network training model can be regarded as the initialization of a deep BP network weight parameter, which makes the DBN overcome the disadvantage that the BP network is prone to local optimization and long training time due to the random initialization of weight parameters.
The first step in the above training model in deep learning is called pre-training, and the second step is called fine-tuning. The top has supervised learning that layer, according to the specific application field can be replaced by any classifier model, rather than the BP network.
about the Bayesian network "de-explaining" (away)
When we establish the BN structure of multi-reason, if one of the reasons has determined the formation of the result, then other reasons should be ignored.
Pictured above is a simple bn model, before we have any evidence that C was established, A and B are independent evidence. In other words, changing one does not affect the other. But as long as we have the evidence to prove that the probability variation of c,a inevitably leads to a change in B probability, and vice versa. They are competing to explain the establishment of C. That is to say, as long as we use a to set up C this thing to explain the past, that B as the reason for C is meaningless.