Original articleReprint please register source HTTP://BLOG.CSDN.NET/TOSTQ the previous section we introduce the forward propagation process of convolutional neural networks, this section focuses on the reverse propagation process, which reflects the learning and training process of neural networks. Error back propagation method is the basis of neural network learning, there are many related content on the network, but about the convolution network error reverse transfer formula deduction is relatively small, but also is not very clear, this article will be detailed derivation of this process, although the content is very complex, but it is worth learning. The first thing we need to know isthe error is transmitted in reverseThe learning method is actually the weight process of the gradient descent method to find the minimum error. Of courseOur aim is to find the derivative of the error energy about the parameter (weight). The gradient descent method updates the weight formula as follows:
here W represents the weight, E is the error energy, N is the nth round update iteration, the η is the learning parameter, y is the output, and δ represents the local gradient. on the other hand, the derivative of the error energy about the parameter (weight) is related to the input of the current layer, so we need a better way to pass the current layer error to the next level, because this δ is independent of the output of the current layer, which simply reflects the fixed structure of the current layer, so we can pass this intrinsic property Δ back to the next layer, which is defined as:
Next we analyze the reverse propagation process of the entire network in layers.There are four main cases in convolutional neural networks in this paper:
I. output layer (single layer neural network layer) (1) The error energy is defined as the error between the actual output and the ideal output. Here d is the ideal expected output, y refers to the actual output, I refers to the output bit, the network output of this article is 10 bits, so n=10.
(2)the derivative of the error energy about the parameter (weight). This layer is relatively simple.
Since this paper is an activation function using the sigmoid coefficient, its derivative can be calculated as:
ItsLocal Area gradientΔ is expressed as:
second, after the output layer of the sampling layer S4 The sampling layer of the subsequent output layer is similar to the reverse propagation of the hidden neurons of the multilayer perceptron. Since this layer has no weights, there is no need to update the weights, but we also need to pass the error energy to the next layer, so we need to calculatelocal gradient δ, which is defined as follows, here J refers to the output image of the number of pixels, the S4 layer has a total of 12*4*4=192 output pixels, so j=1~192.
In addition, the local gradient δ of the output layer O5 has been calculated:
Since the sampling layer does not have an activation function, the derivative of φ is 1, which can eventually be
by means of the above formula, we can calculate the local gradient δ value passed from the output layer O5 to the S4 layer. you can see that the output pixel J passed to the sample layerLocal Area gradientThe δ value, which is actually the sum of the local gradient δ values of the lower output equivalent to the one connected to it, is multiplied by the concatenated weights.
third, the convolution layer of the posterior sampling layer C1, C3 Front for convenient calculation, S4 layer and O5 layerthe output is expanded into one dimension, so the pixels are labeled I and J, and to the C3 layer forward, we label the pixel coordinates m (x, y), and M (x, y) represents the pixels of the (x, Y) position of the M-sheet output template. The local gradient δ value is defined as:
the error energy passed to the pixel equals all the pixel error energy associated with it, and here I refers to all the pixels in the M (x, y) sample neighborhood θ
because the average pooling method is used in this article, The output of the S4 is the average of all the pixels in the neighborhood of the pixel, where s refers to the total number of pixels in the neighborhood θ, this article uses the 2*2 sample block, so s=4.
(1)sopassed by S4 to the C3 layer.Local Area gradientthe Δ value is:
Next we calculate the weight update value of the C3 layer according to the local gradient δ value. (2) The weight of the C3 layer updates the value.
C3 layer 6*12 A 5*5 template, we first define N=1~6,M=1~12 represents the label of the template, S,t represents the location of the parameters in the template
(3) Weight update formula of C1 layer and field gradient δ value Similarly, we can also get the C1 layer weight update formula, here the m=6,n=1, and y refers to the input image
the sampling layer S2 of the convolution layer after the four . Here n is the output image ordinal (n=1~6) of the current S2 layer, and n is the output image ordinal (m=1~12) of the current C3 layer.
Therefore, the local gradient δ value of the nth block image is
code display of error reverse propagation process
void Cnnbp (cnn* cnn,float* outputdata)//Back propagation of the network {int i,j,c,r;//Save error to Network for (i=0;i<cnn->o5->outputnum;i + +) cnn->e[i]=cnn->o5->y[i]-outputdata[i]; /* Reverse Backward calculation *////Output layer O5 for (i=0;i<cnn->o5->outputnum;i++) Cnn->o5->d[i]=cnn->e[i]*sigma_deriv ation (Cnn->o5->y[i]); S4 layer, the error passed to the S4 layer//There is no activation function nSize outsize={cnn->s4->inputwidth/cnn->s4->mapsize,cnn->s4->inputhe ight/cnn->s4->mapsize}; for (i=0;i<cnn->s4->outchannels;i++) for (r=0;r<outsize.r;r++) for (c=0;c<outsize.c;c++) for (j=0;j<cnn->o5->outputnum;j++) {int wint=i*outsize.c*outsize.r+r*outsize.c+c; cnn->s4->d[i][r][c]=cnn->s4->d[i][r][c]+cnn->o5->d[j]*cnn->o5->wdata[j][wint]; }//C3 layer//Each reverse error passed by the S4 layer, which only expands an int mapdata=cnn->s4->mapsize; on the S4 gradient. NSize S4dsize={cnn->s4->inputwidth/cnn->s4->mapsize,cnn->s4->inputheight/cnn->s4->mapsize}; The pooling here is averaged, so the error gradient passed to the next neuron is not changed for (i=0;i<cnn->c3->outchannels;i++) {float** c3e=upsample (cnn-& Gt S4->d[i],s4dsize,cnn->s4->mapsize,cnn->s4->mapsize); for (r=0;r<cnn->s4->inputheight;r++) for (c=0;c<cnn->s4->inputwidth;c++) Cnn-> ; C3->d[i][r][c]=c3e[r][c]*sigma_derivation (Cnn->c3->y[i][r][c])/(float) (cnn->s4->mapsize*cnn- >s4->mapsize); for (r=0;r<cnn->s4->inputheight;r++) free (c3e[r]); Free (c3e); }//S2 layer, S2 layer has no activation function, here only the convolution layer has an activation function part//By the convolution layer to the sampling layer error gradient, there are 6 * 12 convolution layer of the convolution template outsize.c=cnn->c3->inputwidth; outsize.r=cnn->c3->inputheight; NSize insize={cnn->s4->inputwidth,cnn->s4->inputheight}; NSize mapsize={cnn->c3->mapsize,cnn->c3->mapsize}; for (i=0;i<cnn->s2->outchannels;i++) {for (j=0;j<cnn->c3->outchannels;j++) {float** corr=correlation (cnn->c3->mapdata[i][j],mapsize,cnn->c3->d[j],insi Ze,full); Addmat (cnn->s2->d[i],cnn->s2->d[i],outsize,corr,outsize); for (r=0;r<outsize.r;r++) free (corr[r]); Free (corr); }/* for (r=0;r<cnn->c3->inputheight;r++) for (c=0;c<cnn->c3->inputwidth;c++) This is originally used for sample activation */}//C1 layer, convolution layer mapdata=cnn->s2->mapsize; NSize s2dsize={cnn->s2->inputwidth/cnn->s2->mapsize,cnn->s2->inputheight/cnn->s2-> Mapsize}; The pooling here is averaged, so the error gradient passed to the next neuron is not changed for (i=0;i<cnn->c1->outchannels;i++) {float** c1e=upsample (cnn-& Gt S2->d[i],s2dsize,cnn->s2->mapsize,cnn->s2->mapsize); for (r=0;r<cnn->s2->inputheight;r++) for (c=0;c<cnn->s2->inputwidth;c++) Cnn-> ; C1->d[i][r][c]=c1e[r][c]*siGma_derivation (Cnn->c1->y[i][r][c])/(float) (cnn->s2->mapsize*cnn->s2->mapsize); for (r=0;r<cnn->s2->inputheight;r++) free (c1e[r]); Free (C1E); } }
Writing a C-language convolutional neural network CNN Three: The error reverse propagation process of CNN