Deep Learning (depth learning) Learning Notes finishing series (vi)

Source: Internet
Author: User

Deep Learning (depth learning) Learning notes finishing Series

[Email protected]

Http://blog.csdn.net/zouxy09

Zouxy

Version 1.0 2013-04-08

Statement:

1) The Deep Learning Learning Series is a collection of information from the online very big Daniel and the machine learning experts selfless dedication. Please refer to the references for specific information. Specific version statements are also referenced in the original literature.

2) This article is for academic exchange only, non-commercial. So each part of the specific reference does not correspond in detail. If a division accidentally violated the interests of everyone, but also look haihan, and contact bloggers deleted.

3) I Caishuxueqian, finishing summary of the time is inevitable error, but also hope that the predecessors, thank you.

4) Reading this article requires machine learning, computer vision, neural network and so on (if not, it doesn't matter, no see, can read, hehe).

5) This is the first version, if there are errors, you need to continue to amend and delete. Also hope that we have a lot of advice. We all share a little, together for the promotion of the Motherland Scientific research (hehe, good noble goal ah). Please contact: [Email protected]

Directory:

I. Overview

Second, the background

III. visual mechanism of human brain

Iv. about Features

4.1, the granularity of the characteristic representation

4.2. Primary (shallow) feature representation

4.3, structural characteristics of the expression

4.4. How many features are needed?

The basic thought of deep learning

Vi. Shallow learning (shallow learning) and deep learning (Deepin learning)

Seven, deep learning and neural Network

Eight, deep learning training process

8.1. Training methods of traditional neural networks

8.2. Deep Learning Training Process

Common models or methods of deep learning

9.1, Autoencoder Automatic encoder

9.2, Sparse coding sparse coding

9.3. Restricted Boltzmann Machine (RBM) restricts the Boltzmann machines

9.4, deep Beliefnetworks convinced that the degree of network

9.5. Convolutional Neural Networks convolutional neural network

Ten, summary and Prospect

Xi. bibliography and deep Learning learning resources

Pick up

Note: The following two deep learning method description needs to be perfected, but in order to ensure the continuity and integrity of the article, first paste some up, and then modify the good.

9.3. Restricted Boltzmann Machine (RBM) restricts the Boltzmann machines

Suppose there is a two-part diagram, there is no link between the nodes of each layer, the first layer is the visible layer, that is, the input data layer (v), the first layer is a hidden layer (h), if all nodes are assumed to be random binary variable nodes (only 0 or 1 values), and assuming that the full probability distribution P (v,h) We call this model a restricted Boltzmannmachine (RBM).

Let's take a look at why it is a deep learning method. First, because this model is a two-part diagram, in the case of known V, all the hidden nodes are conditionally independent (because there is no connection between the nodes), p (h|v) =p (h1|v) ... p (hn|v). Similarly, in the case of a known hidden layer h, all visual nodes are conditionally independent. At the same time, because all V and H satisfy the Boltzmann distribution, so when the input V, through P (h|v) can get hidden layer h, and after the hidden layer h, through P (v|h) can also get the visible layer, by adjusting the parameters, We are to make the visible layer from the hidden layer V1 and the original visual layer v if the resulting hidden layer is a visual layer of another expression, so the hidden layer can be used as a visual layer of input data features, so it is a deep learning method.

How to train? That is, how to determine the weights between the visible layer nodes and the hidden nodes? We need to do some mathematical analysis. That's the model.

The energy of the Federated Configuration (jointconfiguration) can be expressed as:

The joint probability distribution of a configuration can be determined by the Boltzmann distribution (and the energy of the configuration):

Because the hidden nodes are conditionally independent (because there is no connection between the nodes), that is:

Then we can be relatively easy (factorization factorizes) to get the probability that the hidden layer J node is 1 or 0, based on the given visual layer V:

Similarly, on the basis of a given hidden layer h, the probability that the I node of the visible layer is 1 or 0 can also be easily obtained:

Given a sample set that satisfies a separate distribution: d={v(1), v(2),..., v(N)}, we need to learn the parameter θ={w,a,b}.

We maximize the following logarithmic likelihood function (maximum likelihood estimate: For a probabilistic model, we need to select a parameter that gives us the maximum probability of the current sample being observed):

That is, to the maximum logarithmic likelihood function derivative, you can get L maximum time corresponding to the parameter W.

If we increase the number of layers in the hidden layer, we can get deep Boltzmann machine (DBM); If we use Bayesian belief networks (i.e., the graph model, which is still limited to the nodes in the layer) near the visible layer, We can get Deepbelief Net (DBN) by using restricted Boltzmann machine in the part that is most far from the visible layer.

9.4, deep belief networks convinced degree network

Dbns is a probabilistic generation model, which is relative to the traditional neural network of discriminant models, which is to establish a joint distribution between the observed data and the label, P (observation| Label) and P (label| Observation) were evaluated, and the discriminant model merely evaluated the latter, i.e. P (label| observation). For the application of traditional BP algorithms in deep neural networks, Dbns encounters the following problems:

(1) The need for training to provide a labeled sample set;

(2) The learning process is relatively slow;

(3) Inappropriate parameter selection leads to learning convergence to local optimal solution.

The DBNS consists of a plurality of restricted Boltzmann machines (Restricted Boltzmann machines), a typical neural network type, as shown in three. These networks are "restricted" to a visual layer and a hidden layer, where there is a connection between the layers, but there is no connection between the elements in the layer. The hidden layer unit is trained to capture the correlation of higher-order data displayed in the visible layer.

First, regardless of the topmost form of a associative memory (associative) two layers, a DBN connection is determined by the top-down generation weights, Rbms like a building block, compared to traditional and deep layered sigmoid belief network, It can easily connect weights to the learning.

At the very beginning, by a non-supervised greedy layer-wise approach to pre-training to obtain the weight of the generated model, the unsupervised greedy layered approach is Hinton proved to be effective, and it is called the comparative divergence (contrastive divergence).

During this training phase, a vector v is generated in the visual layer, through which the value is passed to the hidden layer. In turn, the input of the visual layer is randomly selected to attempt to refactor the original input signal. Finally, these new visual neurons activate cells to reconstruct the hidden Layer activation unit, and get H (in the course of training, the visual vector value is first mapped to the hidden unit, then the visual element is reconstructed by the hidden layer element, and the new visual units are mapped to the hidden unit again, thus acquiring a new hidden unit. This repeated step is called the Gibbs sample). These backward and forward steps are familiar with the Gibbs sample, while the correlation difference between the hidden Layer activation unit and the visual layer input is the main basis for the weight update.

Training time can be significantly reduced, because only a single step is required to approach maximum likelihood learning. Adding each layer to the network improves the log probability of the training data, which we can understand as getting closer to the true expression of energy. This meaningful expansion, and the use of untagged data, is a decisive factor in any deep learning application.

At the top two levels, the weights are connected together so that the lower output will provide a reference clue or relate to the top layer, so the top layer will connect it to its memory content. And the last thing we want to do is identify performance, such as sorting tasks.

After pre-training, DBN can adjust the discriminant performance by using the BP algorithm with tagged data. Here, a set of tags will be appended to the top layer (to promote associative memory), through a bottom-up, learned recognition weights to obtain a network classification surface. This performance will be better than the simple BP algorithm trained by the network. This can be intuitively explained that the Dbns BP algorithm only needs a local search of the weight parameter space, which is faster than the Feedforward neural network and has less time to converge.

Dbns's flexibility makes it easier to expand. An extension is convolution Dbns (convolutional deep belief Networks (CDBNS)). Dbns does not take into account the 2-dimensional structure information of the image, because the input is simple to quantify from one dimension of an image matrix. Cdbns is considering this problem, it uses the neighborhood pixels of the spatial relationship, through a model called convolution Rbms to achieve the transformation invariance of the generated model, and can easily be transformed to high-dimensional images. Dbns does not explicitly deal with the time-linked learning of observation variables, although there are already studies in this area, such as stacking time Rbms, as a generalization, dubbed temporal convolutionmachines with sequence learning, the application of this sequence learning, Give the speech signal processing question to bring an exciting future research direction.

Currently, DBNS-related research includes stacking automatic encoders, which replace the Rbms in traditional Dbns with a stacked automatic encoder. This allows the same rules to be used to train the generation of deep multilayer neural network architectures, but it lacks the stringent requirements for the parameterization of the layers. Unlike Dbns, an automatic encoder uses a discriminant model, which makes it difficult to sample the input sample space, making it more difficult for the network to capture its internal expression. However, the noise reduction automatic encoder can avoid this problem well and is better than the traditional Dbns. It generates field generalization performance by adding random contamination and stacking in the training process. The process of training a single noise-cancelling automatic encoder is the same as Rbms training to generate a model.

Continuation of

Deep Learning (depth learning) Learning Notes finishing series (vi)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.