Gradient-based Learning applied to document recognition (reprint)

Source: Internet
Author: User

Deep Learning: 38 (Stacked CNN Brief introduction)

Objective:

This section is mainly to introduce the next stacked CNN (Deep convolutional network), originating from my own in the construction of the SAE network of a little confusion: see deep Learning: 36 (about building a depth convolutional SAE network of a little confusion). Because sometimes for large pictures to recognition, need to use unsupervised learning method to pre-training (pre-training) stacked CNN each layer of network, and then use the BP algorithm to the entire network fine-tuning (fine-tuning), And the output of the previous layer as input to the next layer. These are simple words, but are they really so easy? For starters, it's not so easy to actually implement the process because it involves a lot of detail. This is not intended to be a detailed description of the deep statcked network and covolution,pooling, these parts of the content can refer to the previous blog: Deeplearning: 16 (Deep networks), Deep Learning: 17 (Linear decoders,convolution and pooling). Instead, focus only on one of the following aspects (see explanation later).

  Basic knowledge:

The first thing to know is that the advantage of convolution and pooling is that the number of parameters needed to learn in the network structure becomes less, and the characteristics of the learning have some invariance, such as translation, rotation invariance. Taking 2-D image extraction as an example, the number of parameters learned is less because you do not need to use pixels from the entire image to enter the network, but only a subset of the patches. The invariant characteristics are due to the use of mean-pooling or Max-pooling methods.

Take the classic LENET5 structure diagram as an example:

It can be seen that for this network, each input a 32*32-sized picture, the output of a 84-dimensional vector, which is the vector we extracted the eigenvector.

The C1 layer of the network is made up of 6 28*28-sized features, the source of which is the convolution of the 32*32-sized input plots with 6 5*5 size patches, with 1 pixels per move. The S2 layer then becomes a 6-14*14-size feature map, with 1 values being pooling per 4 pixels (i.e. 2*2). These are easy to understand and are explained in detail in the UFLDL Tutorial feature extraction using convolution,pooling.

The most difficult question is:C3 10*10 size of the feature map is how to come? This is the most wanted in this article.

Some may say, this is not very simple, the content of the S2 layer into an input layer of 5*5, the hidden layer of 16 network can be. In fact, this explanation is wrong, or does not speak of the nature of the problem. My answer is: the S2 feature map with 1 input layer(=5*5*6, not 5*5) nodes, the output layer is 16 nodes of the network are convolution.

And at this point, each feature map of the C3 layer is not necessarily associated with the S2 layer of the feature map, it is possible to only connect with some of them, for example, in LeNet5, the connection situation is as follows:

where x is the indication that there is a connection between the two. Take an analysis of the 16 hidden nodes in the network (structure 150-16) we learned, for example, the 3rd feature in C3, which is connected to the upper network S2 3,4,5. then the first 3 value of the H3 feature chart (assumed to be How did you get it? The process is as follows:

First we put the network 150-16 (in this case, the surface Input layer node is 150, the hidden layer node is 16) in the input of 150 nodes into 6 parts, each part is a continuous 25 nodes. Take the 3rd part of the last node (25), and at the same time with the hidden layer 16 nodes of the 4th (because the corresponding is 3rd, counting from 0) connected that 25 values, reshape for 5*5 size, with the 5*5 size of the feature patch to convolution S2 the 3rd feature graph in the network, assuming that the resulting feature graph is H1.

Similarly, remove the 2nd part of the input in network 150-16 of the node (25), and at the same time the hidden layer 16 nodes in the 5th connected to the 25 values, reshape for 5*5 size, with the 5*5 size of the feature patch to convolution S2 the 2nd feature graph in the network, assuming that the resulting feature graph is H2.

Continue, take out the last 1 parts of the network 150-16 input node (25), and at the same time the hidden layer 16 nodes in the 5th connected to the 25 values, reshape for 5*5 size, with this 5*5 size of the feature patch to convolution S2 the last 1 features in the network, it is assumed that the resulting feature map is H3.

Finally, the H1 , H2 , H3 these 3 a matrix is added to obtain a new matrix H , and the H Add an offset B to each element in the , and by Sigmoid excitation function, we can get the feature map we want H3 up.

Finally, to tell the story, LeNet5 behind the structure can be similar to reasoning. Actually found to use words to describe the process is very difficult, if it is a face-to talk, a few words can be done.

Because in the classic CNN network structure (such as the LeNet5 here), there is no need to pre-traing each layer. But in the current stacked CNN, in order to speed up the optimization of the final network parameters, it is generally necessary to use unsupervised methods for pre-training. Now to solve the 1th problem in deep learning: 36 ( A little confused about building a depth convolutional SAE Network) , The problem that corresponds to the LENET5 framework is: Where does the training sample come from when pre-training the 150-16 network weight w from S2 to C3?

First of all, if we have a total of m large picture as a training sample, then S2 the CPC to get 6*m feature map, its size is 14*14, and we convolution it to use the 5*5 size, and we input to the network is 150-dimensional, So it's definitely necessary to sub-sample the data. So we only need to sample this 6*m picture, each of the 6 features (6 sheets of the S2 layer) randomly sampling several 5*5 sizes (that is, they each sample location is the same) patch, and it in the order of the Res hape150 dimension, as a training sample of 150-16 network, The same method is used to obtain multiple samples, which together form a training sample of the network.

Here are some of the information we searched online these days:

The first is the LeNet5 corresponding to the handwriting font recognition of the demo, you can refer to its Web page:LeNet-5, convolutional neural Networks, as well as the demo corresponding to the Paper:lecun, Y. , et al. (1998). "Gradient-based Learning applied to document recognition.", this paper content is more, just look at one of the individual text recognition that part. Paper in the LeNet5 of each layer of network details can refer to the Web page:deep Learning (depth learning) study notes finishing series (vii).

The following is a simple version of LeNet5 written in Python, with the Theano Machine learning Library implemented:convolutional neural Networks (LeNet), the students who know Python can look at, Relatively easy to understand (do not know Python can actually read a general). About stacked CNN's implementation of MATLAB can be consulted: HTTPS://SITES.GOOGLE.COM/SITE/CHUMERIN/PROJECTS/MYCNN. There are source code and interface.

Finally hition the algorithm paper:imagenet classification with the deep convolutional neural networks in 2012 Imagenet recognition. He also gives the corresponding code, based on the gpu,c++: https://code.google.com/p/cuda-convnet/.

  Summarize:

About statcked CNN Network pre-training Process, the next layer of training sample source has been made clear, but about the last of the entire network fine-tuning process is not very clear, there are many mathematical formulas.

References:

Deep Learning : 36 ( A little confused about building a deep convolutional SAE Network)

Deep Learning : 16 (Deep networks)

Deep Learning : 17 (Linear decoders, convolution and pooling)

Deep Learning (deep Learning) Learning notes finishing Series (vii)

convolutional neural Networks (LeNet)

https://sites.google.com/site/chumerin/projects/mycnn.

Gradient-based Learning applied to document recognition.

Imagenet classification with deep convolutional neural networks.

Feature extraction using convolution

Pooling

Gradient-based Learning applied to document recognition (reprint)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.