CNN Formula derivation

Source: Internet
Author: User

The CNN Formula derivation 1 preface

Before looking at this blog, please make sure that you have read my top two blog "Deep learning note 1 (convolutional neural Network)" and "BP algorithm and Formula derivation". and has read the paper "Notes on convolutional neural Networks" in the literature [1]. Because this is the interpretation of the literature [1] The derivation process of the formula in the first part of the thesis < here is a hypothesis, perhaps the formula is wrong, if there is a good understanding please leave a message >.

2 CNN Formula derivation

The process of parameter solving for convolutional neural networks is similar to the previous note, "BP algorithm and Formula derivation", but there are still changes in form. In the paper [1], the formula of the parametric solution is given directly, including the residuals of the convolution layer and subsampling layer and the corresponding derivative of the weight parameter and the bias parameter.

Note: Here the convolution kernel parameters are placed on the same layer as the residuals, and the last note slightly different, but no impact!

2.1 convolutional Layer: 2.1.1 convolution calculation

Assuming that the L layer is a convolution layer and the l+1 layer is a subsampling layer, the first L Layer J feature map is calculated as follows:

The above * number is essentially the convolution kernel k on all the associated feature maps on the L-1 layer to do convolution operations, and then sum, plus a bias parameter, take sigmoid to get the final excitation value of the process.

Example: Suppose the L-1 layer has only two feature maps, and the size is 4*4 pixels.

A convolution kernel K (two-dimensional convolution core K11 and K12), with a size of 2*2.

A feature map of layer L is computed as follows with a size of 3*3 pixels:

Note: In MATLAB, you can use the CONVN function to achieve convolution, such as:

Image = Convn (IM, kernel, ' valid '); The process of calculation will first reverse the kernel 180 degrees, and then do convolutional operations with IM.

So in this article we give examples of convolution kernel 2*2 size is already rotated 180 degrees, that is,

2.1.2 Residual error calculation

The residual calculation in the BP algorithm is equal to the weighted sum of the weights and residuals of the l+1 layer and all nodes connected to it, multiplied by the value of the point pair Z. The next layer of the convolution layer is the subsampling layer, which uses one-to-one non-overlapping sampling, so the residual calculation is simpler.

The residual calculation formula for the J-feature map of section L is as follows:

Where the L layer is a convolution layer, the first l+1 layer is the subsampling layer, the subsampling layer and the convolution layer are one by one corresponding. where up (x) expands the size of the l+1 layer to the same size as the L layer. For a simple example, if the l+1 layer is 2*2, the Subsampling sample size is 2*2, where a l+1 map of the feature layer corresponds to the residual error:

After that, the expansion became

Intuitive Understanding: as a one-to-one sampling, so the expansion of the first layer of each node only corresponds to the first l+1 layer of a unique node, according to the BP algorithm residual calculation formula can be obtained: the first layer of a node residual is equal to the weight of w multiplied by the L+1 layer corresponding node residual value multiplied by the F (z) derivative, The formula is only a form of vectorization.

2.1.3 Gradient Calculation

(1) The derivative of the bias parameter B is given in the paper, and the formula is:

Here we give the derivation formula, (where the 2nd line to the 3rd line uses the hypothesis, because here the ZJL and BJL are matrices, there is no matrix derivative of the matrix, here is a summation formula. I can not explain whether it is correct, have to know to leave a message! Where NJL is the number of midpoints of the first J feature map of the L layer.

(2) The derivative of the bias parameter k is given in the paper, wherein KIJL represents the kernel of the first L-layer J feature map and the feature map of the first L-1 layer, which is a matrix:

The formula is deduced as follows:

Here we give examples: for example, the convolution layer is the size of the 3*3, the size of the convolution kernel is 2*2 on a layer feature map size of 4*4

The MATLAB code is given in the paper:

Because MATLAB first to reverse 180 degrees, so the code first to the residual rotation 180 degrees, the result is to get the final result K, so the result is rotated 180 degrees.

2.2 subsampling Layer: 2.2.1 convolution calculation

The subsampling layer is set on the L layer, and the L-1 is a convolution layer, because it is a one-off sampling, assuming that the sample size is 2*2, the formula is:

The Down (x) here is the sum of the pixel values in the size of the 2*2 in X. The calculation process sums up the 2*2 size of the previous layer of convolution and then multiplies the weighted W, plus a bias, and then the sigmoid function is obtained.

2.2.2 Residual error calculation

The calculation formula of residual error is given in the paper:

The first layer is subsampling layer and the l+1 layer is the convolution layer.

Intuitive understanding : such as: A featuremap on the left side of the subsampling layer, on the right is a feature map of the convolution layer,

For example, the 7th node in the current node, its connection with the next layer is {k22, 2; k21,3; K12, 5; K11, 6} (this can be calculated with a simple pen drawing). Therefore, according to the calculation method of residual error in BP algorithm is equal to the weighted sum of the weights and residuals of all nodes connected with the l+1 layer and multiplied by the derivative of the point to Z, it is equivalent to using convolution kernel to do the convolution operation on the residual of the next layer of convolution directly, and the form of the formula is obtained. Since the convolution function of MATLAB is rotated to K 180 degrees, we need to rotate it 180 degrees before we calculate the convolution. (Of course, I think the formula here is a bit biased, the subscript should not use J, but with I, but also need to the L+1 layer in all the subsampling connected to the final result of the Featuremap-the source is indeed the case. )

2.2.2 Gradient Calculation

(1) The derivative of the bias B, the derivation process of the formula is the same as the convolution layer.

(2) The derivative of the weighted W, the formula given in the paper is as follows:

The formula is deduced as follows:

Here we give an example: for example, subsampling layer feature map size is 2*2, then its previous layer of convolutional layer feature map size 4*4, as shown below, the calculation process for the previous layer of convolutional layer 2*2 size summed and then multiplied by the weight w, plus a bias, Then the sigmoid function is obtained. (where 1,2,3,4 is just a label, not a specific value)

The calculation process is:

WJL = d11*{1+2+3+4} +d12*{5+6+7+8} + d21*{9+10+11+12} + d22*{13+14+15+16}. Match the formula.

Note: There is a less correct hypothesis in the derivation of the formula, but with this hypothesis, all conclusions are set up, this conclusion may be beyond the altitude of my maths, there is a correct solution to the message, thank you!

Reference documents:

1: Original paper: Notes onconvolutional Neural Networks http://cogprints.org/5869/1/cnn_tutorial.pdf

2: Thesis translation see: http://blog.csdn.net/zouxy09/article/details/9993371

3: How to correctly use the formula in the paper (with small examples) See: http://www.cnblogs.com/tornadomeet/p/3468450.html

CNN Formula derivation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.