This article gives a very brief introduction to the basic concepts of deep belief networks. In this paper, the depth belief network (including its application example) is briefly introduced. Then the following are described: (1) Its basic structure-the basic condition of the restricted Boltzmann machine, and, (2) How this basic structure constitutes a depth belief network. This article only allows the reader to understand the concept of deep belief network, the content is very simple, and even a lot of tight places. If you are interested in learning more about the limited Boltzmann machine, deep belief network, and want to learn more about deep-dive, please visit the official website of Deep learning. Or readers can refer to the "Recommended reading" section at the end of the article. An overview of depth belief networks (deep belief network, DBN) was presented by Geoffrey Hinton in 2006. It is a generative model that, by training the weights between its neurons, allows the entire neural network to generate training data according to the maximum probability. Not only can we use DBN to identify features, classify data, but we can also use it to generate data. The picture below shows the handwritten numerals identified with DBN:
Figure 1 identifies handwritten numbers with a depth belief network.
The lower-right corner of the figure is a black-and-white bitmap of the number to be identified, with three layers of hidden neurons above it. Each black rectangle represents a layer of neurons, the white point represents the neuron in the open state, and black represents the neuron in the closed state. Note that the lower left side of the top-level neuron, even if it does not, corresponds to the upper-left corner of the screen, knowing that the DBN correctly identifies the number.
Here is a natural-language paragraph generated by DBN, who has learned a lot of English Wikipedia articles: In 1974 Northern Denver had been overshadowed by CNL, and several Irish Intelligen Ce agencies in the Mediterranean region. However, on the Victoria, Kings Hebrew stated this Charles decided to escape during an alliance. The Mansion House were completed in 1882, and the second in its bridge were omitted while closing is the proton reticulum compo SED below it aims, such that it was the blurring of appearing on any well-paid type of box printer. DBN are composed of multiple neurons, which are divided into dominant neurons and recessive neurons (hereinafter referred to as explicit and implicit). The explicit element is used to accept the input, and the implicit element is used to extract the feature. Therefore the implicit element also has the individual name, called characteristic detector (feature detectors). The topmost two-layer connection is non-associative, forming the combined memory There is a forward connection between the lower and upper layers. The bottom level represents the data vectors, and each neuron represents one dimension of the data vector. The components of the DBN are restricted Boltzmann machines (Restricted Boltzmann machines, RBM). The process of training DBN is carried out one layer at a level. In each layer, a data vector is used to infer the hidden layer, and the hidden layer is treated as a data vector of the next layer (a layer).
Restricted Boltzmann machine As mentioned earlier, RBM is the constituent element of DBN. In fact, each RBM can be used as a cluster alone. An RBM consists of only two neurons, a layer called the visible layer, composed of explicit elements (visible units) that are used to enter training data. The other layer is called the Hidden layer, which, accordingly, consists of an implicit element (Hidden units), which is used as the feature detector (feature detectors). Figure 2 The structure of the restricted Boltzmann machine. The upper layer of neurons in the graph is composed of hidden layers, and the lower neurons are composed of explicit elements. Each layer can be represented by a vector, and each dimension represents each neuron. Notice the symmetric (bidirectional) connection between the two layers.
The condition independence between neurons should be noted that the neurons within the explicit and hidden layers are not interconnected, only the neurons in the layers have symmetrical connections. The advantage of this is that, given the value of all the explicit elements, the value of each implicit element is unrelated. In other words, in the case of a given hidden layer, the values of all the explicit elements are irrelevant: with this important property, we do not have to compute one each time we calculate the value of each neuron, but simultaneously compute the whole layer of neurons in parallel. The process of using an RBM assumes that we now have a well-trained RBM, with a matrix w for each of the weights between the implicit and explicit elements, and: where Wij represents the weights from the I-I to the J-element, M represents the number of explicit elements, and n represents the number of hidden elements. Then, when we attach a new data to the clamp to the explicit layer, the RBM will decide to turn the hidden element on or off according to the weight value W. Here's how to do this: first, calculate the excitation value (activation) for each implicit element: Note that the condition independence between the neurons mentioned above is used here. Then, the excitation values of each implicit element are normalized with the S-shape function, which becomes the probability value of their opening shape (denoted by 1): Here the S-shape function we are using the Logistic function: At this point, the probability of the HJ opening of each hidden element is computed. The probability that it is in the off state (denoted by 0) is naturally the same as whether the element is open or closed, we need to compare the probability of opening with a random value drawn from a 0, 1 uniform distribution and then turn the corresponding hidden element on or off. Given the hidden layer, the method of calculating the explicit layer is the same.
Training RBM RBM training process, in fact, is to find a most likely to produce training samples of the probability distribution. In other words, a distribution is required, in which the probability of training a sample is greatest. Since the determining factor for this distribution is the weight W, the goal of our training RBM is to find the best weights. In order to keep the reader interested, we do not give the derivation process of maximizing the logarithmic likelihood function, which directly shows how to train the RBM. G. Hinton proposed a learning algorithm called contrast divergence (contrastive divergence). Let us elaborate on its specific process. We follow the notation notation of the previous article. Algorithm 1. Contrast divergence for each record in the training set the RBM can accurately extract the characteristics of the explicit layer, or restore the explicit layer according to the characteristics represented by the hidden layer.
Depth Belief network Previous we have introduced the basic structure of RBM and its training and use process, and then we introduce the relevant content of DBN. DBN is a neural network composed of multi-layer RBM, which can be regarded as a generation model or a discriminant model, and its training process is: using unsupervised greedy layer-wise method to pre-training to acquire weights. Training Process: 1. First full training of the first rbm; 2. The weight and offset of the first RBM is fixed, then the state of its recessive neurons is used, and the input vector of the second RBM is; 3. After the second RBM is fully trained, the second RBM is stacked above the first RBM; 4. Repeat the above three steps any number of times; 5. If the data in the training set is labeled, then in the top-level RBM training, in addition to the dominant neurons in the explicit layer of the RBM, it is necessary to have the neurons representing the categorical tags, to train together: a) assuming that there are 500 dominant neurons in the top-level RBM, the classification of training data is divided into 10 categories; b) The upper layer of RBM has 510 dominant neurons, and the corresponding labeled neurons are opened to 1 for each training data, while others are closed to 0. 6. DBN is trained as follows: (schematic) Figure 3 trained depth belief network. The green part of the figure is the label that participates in the training in the topmost RBM. Note the tuning (fine-tuning) process is a discriminant model Another: the tuning process (fine-tuning): generation model is tuned using the contrastive wake-sleep algorithm, whose algorithm process is : 1. In addition to the top level RBM, the weights of other layers of RBM are divided into upward cognitive weights and downward generated weights of; 2. Wake Stage: A cognitive process that produces an abstract representation (node state) of each layer through external features and upward weights (cognitive weights), and uses gradient descent to modify the downstream weights (generation weights) between layers. That is, "if the reality is different from what I imagined, changing my weights makes me think of something like this." 3. Sleep Stage: The build process, by means of the top level (the concept learned at the time of waking) and downward weights, to generate the underlying state, while modifying the weights between the layers. That is, "if the vision of the dream is not the corresponding concept in my mind, changing my cognitive weights makes this kind of image seem to me Read ".
Use procedure: 1. A sufficient number of Gibbs samples were used in the top-level RBM using the random recessive neuron state value, 2. Spread downward to get the status of each layer.
Recommended reading paper 1. Representation learning:a Review and New perspectives, Yoshua Bengio, Aaron Courville, Pascal Vincent, ARXIV, 2012. 2. The monograph or review paper Learning deep architectures for AI (Foundations & Trends in Machine learning, 2009). 3. Learning–a New Frontier in Artificial Intelligence research–a survey paper by Itamar Arel, Derek c. R OSE, and Thomas p. Karnowski. 4. A Fast Learning algorithm for deep belief Nets by Geoffrey E. Hinton and Simon Osindero. Blog and Web Tutorial 1. Introduction to Restricted Boltzmann machines by Edwin Chen. 2. An Introduction to Restricted Boltzmann machines by Yuhuan Jiang. 3. Restricted Boltzmann machine-short Tutorial by Imonad. 4. "Deep Learning Learning Notes finishing series" by Zouxy.
Here's a supplement to the last side
Deep belief networks convinced network
Dbns is a probabilistic generation model, which is relative to the traditional neural network of discriminant models, which is to establish a joint distribution between the observed data and the label, P (observation| Label) and P (label| Observation) were evaluated, and the discriminant model merely evaluated the latter, i.e. P (label| observation). For the application of traditional BP algorithms in deep neural networks, Dbns encounters the following problems:
(1) The need for training to provide a labeled sample set;
(2) The learning process is relatively slow;
(3) Inappropriate parameter selection leads to learning convergence to local optimal solution.
The DBNS consists of several restricted Boltzmann machines (Restricted Boltzmann machines), a typical type of neural network shown in Figure Iii. These networks are "restricted" to a visual layer and a hidden layer, where there is a connection between the layers, but there is no connection between the elements in the layer. The hidden layer unit is trained to capture the correlation of higher-order data displayed in the visible layer.
First, regardless of the topmost form of a associative memory (associative) two layers, a DBN connection is determined by the top-down generation weights, Rbms like a building block, compared to traditional and deep layered sigmoid belief network, It can easily connect weights to the learning.
At the very beginning, by a non-supervised greedy layer-wise approach to pre-training to obtain the weight of the generated model, the unsupervised greedy layered approach is Hinton proved to be effective, and it is called the comparative divergence (contrastive divergence).
During this training phase, a vector v is generated in the visual layer, through which the value is passed to the hidden layer. In turn, the input of the visual layer is randomly selected to attempt to refactor the original input signal. Finally, these new visual neurons activate cells to reconstruct the hidden Layer activation unit, and get H (in the course of training, the visual vector value is first mapped to the hidden unit, then the visual element is reconstructed by the hidden layer element, and the new visual units are mapped to the hidden unit again, thus acquiring a new hidden unit. This repeated step is called the Gibbs sample). These backward and forward steps are familiar with the Gibbs sample, while the correlation difference between the hidden Layer activation unit and the visual layer input is the main basis for the weight update.
Training time can be significantly reduced, because only a single step is required to approach maximum likelihood learning. Adding each layer to the network improves the log probability of the training data, which we can understand as getting closer to the true expression of energy. This meaningful expansion, and the use of untagged data, is a decisive factor in any deep learning application.
At the top two levels, the weights are connected together so that the lower output will provide a reference clue or relate to the top layer, so the top layer will connect it to its memory content. And the last thing we want to do is identify performance, such as sorting tasks.
After pre-training, DBN can adjust the discriminant performance by using the BP algorithm with tagged data. Here, a set of tags will be appended to the top layer (to promote associative memory), through a bottom-up, learned recognition weights to obtain a network classification surface. This performance will be better than the simple BP algorithm trained by the network. This can be intuitively explained that the Dbns BP algorithm only needs a local search of the weight parameter space, which is faster than the Feedforward neural network and has less time to converge.
Dbns's flexibility makes it easier to expand. An extension is convolution Dbns (convolutional deep belief Networks (CDBNS)). Dbns does not take into account the 2-dimensional structure information of the image, because the input is simple to quantify from one dimension of an image matrix. Cdbns is considering this problem, it uses the neighborhood pixels of the spatial relationship, through a model called convolution Rbms to achieve the transformation invariance of the generated model, and can easily be transformed to high-dimensional images. Dbns does not explicitly deal with the time-linked learning of observational variables, although there are already studies in this area, such as stacking time Rbms, which are generalized, with sequential learning dubbed
Temporal convolutionmachines, the application of this sequence learning, brings an exciting future research direction to the speech signal processing problem.
Currently, DBNS-related research includes stacking automatic encoders, which replace the Rbms in traditional Dbns with a stacked automatic encoder. This allows the same rules to be used to train the generation of deep multilayer neural network architectures, but it lacks the stringent requirements for the parameterization of the layers. Unlike Dbns, an automatic encoder uses a discriminant model, which makes it difficult to sample the input sample space, making it more difficult for the network to capture its internal expression. However, the noise reduction automatic encoder can avoid this problem well and is better than the traditional Dbns. It generates field generalization performance by adding random contamination and stacking in the training process. The process of training a single noise-cancelling automatic encoder is the same as Rbms training to generate a model.
Personal understanding: In the DBNS network, its training is hierarchical, each layer of training before the next layer of superposition, the calculation of the parameters of this layer, there is DBNS can be seen as an unsupervised self-coding process its last layer can be added to the classification of ingredients for training, Sae,sdae, DBN can be seen as a large type, and the disadvantage is that it can only be one-dimensional data.
Translated from: http://blog.csdn.net/losteng/article/details/51001247