Understanding the error of convolutional neural Network (I.)

Source: Internet
Author: User

The first part of the full-connected network weights update

convolutional neural network using gradient-based learning methods to supervise training, in practice, the general use of random gradient descent (machine learning in several common gradient descent) version, for each training sample is updated once the weight, error function using the error square Sum function, error mode using the square error cost function.
Note: This article is mainly in accordance with the contents of the reference material to write, some of which added their own explanations, the content of this article is constantly updated and enriched, until the people can read the full understanding of convolutional neural network calculation process.

1.1 Error of sample in forward propagation and output per layer

The output function for the L-layer (l to represent the current layer) of the full-join zone is:
            
The error on all training sets is just the sum of the errors of each training sample, first consider the BP for a sample. The error for the Nth sample is expressed as:
               
Where TK represents the nth sample corresponding to the label of the K Dimension, YK represents the nth sample corresponding to the network output of the K output.

1.2 Weight update of samples in reverse propagation

Weight update specifically, for a given neuron, get its input, and then use this neuron's delta (that is, δ) to scale. The expression in the form of a vector is that for the first L layer, the derivative of the error for each weight (combined matrix) of the layer is the cross-multiplication of the input of the layer (equal to the output of the previous layer) and the sensitivity of the layer (in which the δ of each neuron of the layer is combined into a vector form). Then the resulting partial derivative multiplied by a negative learning rate is an update of the weights of that layer's neurons:
             
A) for the weight of the L-tier, we have:
              
    
B) for the weights of the L layer, the partial derivative is obtained:
           

The second part is the error-sensitive item of convolutional layer when the next layer of the convolution layer is pooling layer.

In a convolution layer, the feature maps of the previous layer are convolution by a learning convolution kernel, and then an activation function can be used to obtain the output feature map, each output map may be a combined convolution of multiple input maps values. First, the output of the convolution layer is:
                
The above * number is essentially the convolution kernel k on all the associated feature maps on the L-1 layer to do convolutional operations.
Since the next layer of the convolution layer is a sampling layer, it is first necessary to know which neurons in the next layer relate to node I of the convolutional layer, and then analyze the error based on the original sampling method. Since the sampling layer is sampled from the convolution layer, the same node will not be duplicated (note that there is no duplicate sampling, that is, there is no overlapping area, this is not the same as the sliding window operation when convolution), and thus, a local sensing field in the convolution layer corresponds to an input of the neurons in the sampling layer.

Assuming that the convolution layer we are analyzing now is layer L, the next layer is the L+1 layer (the pool layer). The use of one-off non-overlapping sampling. The error entry for node J of the L-layer is:

The above formula does not take into account the weight of the first layer to the next layer:
             
Where: Ups (x): X is sampled, here represents the contribution of x in the error term of the next layer.
Next we need to know how this ups () is obtained. The specific operation to be based on the previous pooling method, because the next layer of pooling layer of each node by the L layer of multiple nodes, the error-sensitive value of each node in the pooling layer is also generated by the error sensitivity of multiple nodes in the convolution layer, There are two kinds of common sampling methods: maximum pooling and average pooling.

A) If the Mean-pooling method is used earlier, divide the error term of the next layer by the size of the filter used in the next layer. If the filter size for the next layer is k*k, then:
              
The unsample operation at Mean-pooling can be implemented using the function Kron () in Matalb, because it is the matrix Kronecker product. C=kron (A, b) indicates that matrix B is multiplied by each element in Matrix A, and then the result of multiplying is placed in the corresponding position in C.
B) If the previous use of the Max-pooling method, you will need to record the forward propagation process in the pooling region of the maximum position, and then determine whether the current node is in the largest position, if the maximum position is directly the current next layer of error value is assigned to, otherwise its value is assigned 0. That is, the original convolution block output the maximum value of the neuron for the reverse propagation, the other neuron to the weight of the contribution of the update to do 0.
With the error-loss term above, we now start to calculate the partial derivative of the loss function against the base and the derivatives of the weighted vector (the so-called gradient calculation ):
A) the derivative of the base, the derivative of the loss function to the base is:
                 

B) The variation of the weight value, the derivative of the loss function to the weight value is:
               

The third part is the error-sensitive term of the pooling layer when the next layer of the pooling layer is the convolution layer.

For the sampling layer, the output value is calculated as:

              

where down (XJ) is the next sampling of Neuron J.

Here we go to the above convolution layer, we need to calculate the error term, and then through the error term can be calculated to obtain other weights and biases.

Since the next layer of the sampling layer is a convolution layer, each node of the sampling layer may be convolution multiple times. If the current sampling layer is the L layer, we need to calculate the error of the neuron of the J node, then we first need to find out which neurons in the l+1 layer have used the over-junction J, which requires us to save the mapping process of the neuron when I roll the L-layer to the l+1 layer, because it is needed to calculate the reverse propagation error. First, assume that the number of neurons with node J in the L+1 layer is m,

The error entry for the L layer is:

               

Now we can easily pair the derivative of the training bias and displacement bias:
                  
              

The most important step is to solve the error term (also known as sensitivity), the other calculations are based on this. The solution of the error term is first to analyze which node J needs to be computed and which nodes of the next layer are related, because node J affects the final output through the next layer of neurons connected to the node, which also requires preserving the connection between each layer node and the previous node, so that it is easy to use when calculating errors in reverse.
The next post is a detailed description of some of the problems that occur in convolutional neural networks. Blog Address: http://blog.csdn.net/u010402786/article/details/51228405

Main references:
http://blog.csdn.net/zouxy09/article/details/9993371

Http://www.cnblogs.com/liuwu265/p/4707760.html

Http://www.cnblogs.com/loujiayu/p/3545155.html?utm_source=tuicool

http://blog.csdn.net/lu597203933/article/details/46575871

Understanding the error of convolutional neural Network (I.)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.