Notes on convolutional neural networks

Source: Internet
Author: User

This is a 06 year old article, but a lot of places are worth looking at.

I. Summary

The main points of CNN's Feedforward Pass and BackPropagation Pass, the key is the convolution layer and polling layer of the BP deduction explained.

Two, the classical BP algorithm

Forward propagation needs attention is the data normalization, the training data normalized to 0 mean and unit variance, can be improved in the gradient decline, because this can prevent premature satiety, mainly because of the early sigmoid and tanh as an activation function of the drawbacks (function is too large or too small, Gradients are very small), and so now with the Relu and batch normalization these two artifacts, basically to the gradient disappeared problem has a good solution. Then the BP algorithm, the paper is a little difficult to understand is the derivation of equation 5,

(quoted here for translation http://www.cnblogs.com/shouhuxianjian/p/4529202.html):

the "error" in the network where we need the back propagation can be thought of as "sensitivity" to each unit with biased disturbances. Other words:

(Equation 4)

Because, the bias sensitivity is actually equal to the error bias generated by all inputs of a unit. The following is the BP from high to Low:

(Equation 5)

because the left is the input x error bias, because, so to the front of the error bias on the first l+1 W on the multiplication of the activation function of the bias, this if the BP algorithm has a good understanding of the word should be well understood .

The "O" here means that it is multiplied by the original. For the error function in Equation 2, the sensitivity of the output layer neurons is as follows:

(Equation 6)

Finally, the delta-rule of the updated weights for a given neuron is to replicate the input portion of the neurons, only to scale with the delta of the neurons (in fact, it is two multiplied by equation 7 below). In the form of vectors, this corresponds to the outer product of the input vector (the output of the front layer) and the sensitivity vector:

(Equation 7)

(Equation 8)

Third, CNN

The advantage of sub-sampling is to reduce the computational time and gradually build more far-reaching space and configuration invariance, the latter point is very awkward, I understand the translation and scaling invariance of the confidence bar.

1. Calculate gradients

Here the paper is very unreasonable, direct pendulum formula, nothing ... The reverse propagation of convolutional layer and polling layer is still quite important, here is recommended a blog post http://www.cnblogs.com/tornadomeet/p/3468450.html, the inside of the CNN BP algorithm is very good, basically pushed through these four questions, CNN's BP algorithm has a certain understanding.

The more difficult to understand in the blog is this diagram:

That is, the convolution nucleus rotated 180 degrees, from left to right, then from top to bottom, calculate each value, and Feadforward pass convolution different, because that is actually a related operation ... It's just a convolution of the form, this is the orthodox convolution processing.

Here need to clarify the four problems, first of all the output error-sensitive items, this direct look at the deduction of the line, and then the next layer of the convolution layer for the pooling layer, the convolution layer of the error-sensitive items, because the reverse propagation when the output is smaller than the input, so the gradient at the time of transmission and traditional BP algorithm, So how to get the error-sensitive item of convolutional layer is the problem to consider. The third problem is to consider the pooling layer below the convolution layer, this is because we want to get the pooling layer error sensitivity, relying on the convolution core error sensitive, also because of the scale problem, so need to consider. The last problem is the convolution layer itself, after getting the error sensitivity of the output, how to get the W, this as long as the relevant operation can be obtained, the simple understanding is the L layer I and l+1 layer J between the weights equal to the l+1 layer J error Sensitive value multiplied by the L layer I input, and the convolution operation is an accumulation process, So when it comes to BP, it also needs to be related to the operation.

Iv. Combine

The thesis finally said several feature map fusion idea also very good understanding, my question is, this feature map is not always as multi-channel output, why need to merge, later many well-known net seems also did not use this method, is just a try?

Notes on convolutional neural networks

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.