Error inverse propagation (Error backpropagation, BP) algorithm derivation and vectorization representation

Source: Internet
Author: User

1. Preface

After reading the convolutional neural network is very good cs231 after the total feeling is not enjoyable, the main reason is that although the convolutional neural network calculation process and the basic structure, but still can not understand the convolutional neural network learning process. So I found the advanced textbook notes on convolutional neural Networks, the results just see the 2nd chapter of the textbook on the BP Algorithm review is puzzled, different from the previous I learned to each weight value of the formula to update the derivation, By quantifying it with only 5 formulas, the description of the update formula for the connection weights was completed, so I saw at first that the internal structure of each vector was not clear at all. The reason is also estimated that they did not study deep enough, just away from the last time I deduced BP algorithm has been quite long, so re-picked up the textbook study Review and one by one, the final notes on convolutional neural The quantitative representation in networks is clear. This is a detailed record of the above review and corresponding process, if you read it and deduce it once again, it will be clear how the BP algorithm of each update formula is how to come.

2. Symbol definition

Any derivation that does not describe the meaning of the symbol is bullying, so this section describes the network structure and symbols used for derivation. The basic use of Zhou Zhihua's "machine learning" in the book of the symbolic definition, but he still made some symbolic changes.

Network structure:

Figure 1. Fully connected network structure, from top to bottom, output layer, hidden layer and input layer

Symbol Description:

: output of the output layer J neuron (j=1,2,..., L);

: The bias of the output layer J neurons;

: input of the output layer J neuron;

: The output of the hidden layer h neurons (h=1,2,..., q);

: The bias of the h neuron of the hidden layer;

: The input of the H neuron of the hidden layer;

Input layer I neuron input;

: The connection weights between the h neuron of the hidden layer and the first J neuron of the output layer;

: The connection weights between the first neuron of the input layer and the H neuron of the hidden layer;

The two orange lines in Example 1 represent the connection where the two weights were

3, the problem target

For a sample, its input can be described as:

The expected output of this input is known as:

and its actual output is:

Our goal is to use the above information to update the parameters in the network, thus reducing the error.

4. Detailed deduction

First, the mean square error formula is used to measure the error:

(1)

Among them we have:

(2)

The above means that the output of the output layer neuron can be determined by its input, bias, plus the activation function f (*). and its input equals the sum of the output of the hidden layer neuron and the value of the connection weight, namely:

(3)

-------------------------the following is the derivation of the formula for the renewal of the connection weights between the hidden and output layers -----------------------------

The BP algorithm uses the gradient descent method to adjust the parameters so that the error moves in the direction of decreasing. Therefore, the update formula for the connection weight value can be expressed as:

(4)

(5)

Among the formula (4) is the learning rate. It can be seen that our task is to find out the formula (4), and then we can update the weights by adding the value of the old connection weights. However, the partial derivative of the formula (4) cannot be obtained directly, and it needs to be transformed by the chain law. In order to convert, we first need to know how the connection weights between the hidden layer and the output layer affect the final mean square error E . Obviously it will first affect the input of the output layer neuron. , and then affects the output of the output layer neuron , which ultimately affects the mean square error E. So we can construct the chain rule according to the above description:

(6)

The three partial derivatives in the formula (6) are from left to right: the output of the output layer neuron J affects the mean square error, and the input of the output layer neuron J affects its own output, and the connection weights we want to update affect the input of the output layer neuron J.

the problem can then be translated into solving these three partial derivatives :

[1] According to formula (1) We can easily get the result of the first partial derivative:

(7)

[2] According to the formula (2) can also be easily obtained the result of the second partial derivative:

(8)

[3] According to the formula (3) can also get the result of a third derivative:

(9)

So together we can get the final update formula for the connection weights between the hidden layer and the output layer:

(10)

----------------------------the following is the derivation of the updated formula for the implicit layer neuron bias -----------------------------------

The updated formula of the implicit layer neuron bias can be solved very "gracefully" using the equivalent idea. We often put the bias equivalent to a connection right value is always 1, the output of the hidden layer neurons, further thinking we can also be equivalent to the connection weights , and the output is always 1 of the hidden layer neurons is not it? We have deduced the connection weights between the hidden layer neurons and the output layer neurons.

So in fact we just need to replace the Bashi (10) with 1 to get an updated formula for the hidden layer neuron bias:

(11)

------------------------- Below is the derivation of the formula for the connection weights between the input layer and the hidden layer -----------------------------

First, similar to the update formula of the connection weights between the hidden layer and the output layer, we have:

(12)

And we have:
(13)

(14)

Similarly, you need to convert the formula (12) into the following form using the chain rule:

(15)

by formula (14) You can find that the first derivative cannot be obtained directly, and further decomposition is required . But in order to simplify the formula, we first find out the two partial derivatives and put them on one side. In particular:

[1] According to formula (14) can be obtained the result of the second partial derivative:

(16)

[2] According to formula (13) can get the result of the third partial derivative:

(17)

Now we're going to ask for the first derivative, which can be decomposed according to the chain rule:

(18)

Reference (3), (7), (8) The final result of the formula (18) can be obtained:

(19)

General (16), (17), (19) The final result of formula (15) can be obtained:

(20)

---------------------------------------------------------------------------------------------------------

Additionally, the bias of the hidden-layer neurons can be solved by equivalent ideas similar to those described above. Just change the last item of formula (20) to 1 to get the updated formula of the hidden layer neuron bias.

---------------------------------------------------------------------------------------------------------

The updated formula of all parameters of BP algorithm is deduced.

V. Quantification of the Representation

Although the derivation is easy to understand, but the formula is still relatively long. The formula can therefore be re-expressed using a vectorization representation.

First, the symbols are explained by the calculation of the output layer. The calculation of the output layer can be represented by the following matrix multiplication:

(21)

The calculation process for the layer a (because the first layer is the input layer, therefore a greater than 1) can be expressed in the following way:

(22)

Each item in the meaning of the formula (21) corresponds to each of the one by one.

Then by formula (4) and (12) We can sort out the universal formula for the renewal of the connection weights:

(23)

And:

(24)

Among the needs of the situation:

[1] The first layer is the output layer:

(25)

[2] Layer A is not an output layer:

(26)

The operators represent multiplication of the corresponding elements instead of matrix multiplication, and note that the formula (26) is a recursive type.

This means that the quantification is complete.

Error inverse propagation (Error backpropagation, BP) algorithm derivation and vectorization representation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.