Pattern Recognition class notes--deep learning

Last Update:2016-04-16 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction: The wave of deep learning begins with an article in Hinton reducing the dimensionality of Data with neural Networks;

Representative people: Geoffery Hinton Link:http://www.cs.toronto.edu/~hinton/

Yann lecun Link:http://yann.lecun.com/ex/index.html

Yoshua Bengio Link:http://www.iro.umontreal.ca/~bengioy/yoshua_en/

Andrew Ng Link:http://www.andrewng.org/

Development history:

-hopfield Network

–boltzman Machine

–restricted Boltzman Machine

–cnn

–rnn

–lstm

–autoencoder

–dbn

–dbm

–deep Learning

1.CNN

In 1998, Yann LeCun designed a convolution neural network for image processing, which was used for text recognition and achieved good results.

The goal is to have a natural image, and to start learning directly from the bottom of the image (unstructured feature learning). That is, for the image recognition task, it is not necessary to extract the characteristics of human design beforehand, such as Gabor texture feature, Multiscale wavelet feature, SIFT feature, hog feature, etc. (Because these people design features have shortcomings: These features have some parameters, such as scale, gradient direction, frequency domain division, etc., its generalization ability is not strong)

CNN redesigned the structure of the multilayer neural network (without altering the neurons), because if the image is directly input and each pixel is treated as a node, for an image of 200*200 size, there are 40,000 nodes in the input layer only, and if the first hidden layer contains only 1000 nodes, The number of weights will reach 40 million. This is obviously a huge computational burden.

To solve this problem, CNN has adopted three strategies: the first local link, the second weighted share, and the third one under the sample (the third is called pooling).

First local link: each neuron does not really need to perceive the global image, only the local perception.

Second weight sharing: Connect to an implicit node of the same type from any local area of the image, and its weight remains the same.

The idea of image convolution describes the two processes: using a filter to convolution the image, the size and value of the filter does not change with the region position, and the weight of the filter convolution kernel is the input layer to the hidden layer weight. As shown in the following examples:

We drastically reduced the number of weights we needed to learn, and then we were able to add a few more filters:

After doing so, the number of nodes in the hidden layer is not randomly selected, but is determined by the size of the original image, the size of the filter, the number of filters together. The dimension is higher, it is easy to produce overfitting, so the third strategy is introduced to pooling--the characteristics of different locations, for example, to calculate the average value (or maximum) of a particular feature on an area of an image.

Next is the first hidden layer to the second hidden layer operation, it is important to note that the results of multiple filters in the first layer simultaneously weighted processing (that is, with a cubic filter for the next filter), of course, the local connection and weight sharing is the same way, but the equivalent of the effect on 4 images:

The next layer will do the same (but the size and number of convolution cores are different):

– Convolution (convolution)

• Add the sum of the convolution to the next layer

• Motivate the sum of convolution (in particular: can also be motivated after pooling)

– Aggregation (Pooling)

• Down sampling, two basic operations:

–2*2 window averaging (or maximum)

Add a fully connected multilayer perceptron to the last two tiers:

Network training: Error reversal propagation algorithm is still used.

2.AutoEncoder (self-encoder)

Objective: The training samples of convolutional neural networks are labeled, but many of the problems are not labeled, so our natural idea is whether we can deal with this kind of problem.

The solution: to make the output equal to the input throughout the network, the hidden layer can be understood as the feature used to record the data.

Core Ideas

– Enter input into a encoder encoder and you will get a code. This code is also a representation of the input.

– Evaluate the code quality by adding a decoder decoder and using a signal refactoring method.

– Ideally, you want the information that the decoder outputs (the expression may not be the same, but in essence the same pattern is the same) as the input signal, input.

– The error is generated and we expect the error to be minimal.

– The code of the first layer is treated as the input signal of the second layer, minimizing the reconstruction error, getting the weight parameter of the second layer, and obtaining the second layer of code, the second expression of the original number.

– While the current layer is being trained, the other layers are fixed. Completes the current encoding and decoding task. The previous "encoding" and "decoding" were not considered.

– Therefore, this process is essentially a static stacking (stack) process

– Each training can be trained with a BP algorithm on a three-layer forward network

After completing the Autoencoder learning task, the weights learned in the decoding phase will not be considered in practical applications. Therefore, the essence of the network system acquired in the coding phase is a characteristic learning. The higher the hierarchy, the larger the structural features become.

For classification tasks, you can use Autoencoder to learn the data beforehand. Then take the weight of learning to be the beginning of weights, using tagged data to the network to learn again, that is, the so-called fine-tuning. To achieve classification tasks, you need to add a classifier to the last layer of the encoding phase, such as a multilayer perceptron (MLP).

3.RNN

The network structure is as follows: (note that the markings here are vectors and matrices, representing a network structure, as shown on the right)

The W here is the source of the memory, and the input is carried out in chronological order (two graphs are the same, but the representations are different):

The next step is how to train such a large network, so a method is introduced lstm

4.LSTM (Long short term Memory)

Basic idea: Once the information is used, we want the node to release (forget) this cumulative effect, thus making the network more flexible and autonomous learning (dependent on learning content to self-decide).

It is based on the RNN network structure to change the composition of hidden layer neurons, as follows:

Three gate units added by simple input, output, and hidden layer self-loops

-Forgotten Door: Controls the degree of forgetting about the internal state of the cell

-Get Started: Control the level of reception of cell input

-Out of the door: control the level of recognition of cell output

The corresponding formula is as follows:

The goal of network training is to estimate the following matrices:

Source: HTTP://PAN.BAIDU.COM/S/1SLTG4BJ

Recommended Articles: Yann LeCun , Yoshua Bengio and the Geoffrey Hinton a summary article on cooperation "Deeplearning" :http://www.nature.com/nature/journal/v521/n7553/full/nature14539.html , and its translation: HTTP// www.csdn.net/article/2015-06-01/2824811

Pattern Recognition class notes--deep learning

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Pattern Recognition class notes--deep learning

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support