Neural network and deep learning--error inverse propagation algorithm

Source: Internet
Author: User

Before explaining the error back propagation algorithm, let's review the flow of the signal in the neural network. Please understand that when input vector \ (x\) input Perceptron, the first initialization weight vector \ (w\) is randomly composed, can also be understood as we arbitrarily set the initial value, and the input do dot product operation, and then the model through the weight update formula to calculate the new weight value , the updated weight value then interacts with the input, so iterate multiple times, and get the final weight.

The signal is propagated forward, and the weights are updated in reverse to propagate, is that so?
Yes, your intuition is true, it is the reverse spread.

1. The substance of Feedforward

The term reverse propagation is often misunderstood as the entire learning algorithm for multilayer neural networks. In fact, reverse propagation refers only to the method used to calculate the gradient, while another algorithm, such as a random gradient descent, uses this gradient to learn. In addition, the reverse propagation is often misunderstood to apply only to multilayer neural networks, but in principle it can calculate the derivative of any function (for some functions, the correct response is to report that the derivative of the function is undefined).

1.1 Initializing weights and input signals

Instead, we'll use the Lezilai visualization of the previous matrix multiplication to help us understand the reverse propagation. Assume that there are three layers of network, input layer, hidden layer, output layer, an existing set of signal \ (x\) input network, input layer and hidden layer of link weights \ (w_{input-hidden}\) and hidden layer and output layer between the weight \ (w_{ hidden-ouput}\) we initialize randomly. For clarity, we only label a few weights, and the first input node and the middle hidden layer between the first node have a weight of \ (w_{1,1}\) = 0.9, as shown in the neural network. Similarly, you can see that the weight of the link between the second node of the input and the second node of the hidden layer is \ (w_{2,2}\) = 0.8, the third node of the hidden layer and the second node of the output layer have a weight of \ (w_{3,2}\) = 0.2 ... this naming method is explained earlier, and after labeling helps us understand when analyzing the reverse propagation.


Figure 5.2.1


Input matrix:
\[x=\begin{bmatrix}0.9\\ 0.1\\ 0.8\end{bmatrix}\]
Connection weights between the input layer and the hidden layer:
\[w_{input-hidden}=\begin{bmatrix}0.9 & 0.3&0.4 \ 0.2& 0.8&0.2 \ 0.8& 0.1&0.9 \end{bmatrix}\]
Connection weights between hidden and output layers:
\[w_{hidden-output}=\begin{bmatrix}0.3 & 0.7&0.5 \ 0.6& 0.5&0.2 \ \end{bmatrix}\]

1.2 Input layer to hidden layer

After the initial value is defined, the combination of input to the hidden layer is calculated to adjust the input value \ (x_{hidden}\).
\[x_{hidden} = W_{input_hidden} \cdot x\]
The matrix multiplication here is still performed by the computer, and the answers are calculated as follows:
\[x_{hidden} =\begin{bmatrix}0.9 & 0.3&0.4 \ 0.2& 0.8&0.2 \ 0.8& 0.1&0.9 \end{bmatrix} \cdot \be gin{bmatrix}0.9\\ 0.1\\ 0.8\end{bmatrix}\]
\[x_{hidden} =\begin{bmatrix}1.16\\ 0.42\\ 0.62\end{bmatrix}\]
Don't worry. Go down, let's tidy up the network signal flow,\ (x_{hidden}\) as the first layer of output, the second layer of input has been correctly solved, now it is ready to enter the hidden layer.


Figure 5.2.2


\ (x_{hidden}\) as soon as we enter the hidden layer, we use the S activation function on these nodes of \ (x_{hidden}\) to make it more natural, and we name this set of output signals after the S function is called \ (o_ {hidden}\).
\[o_{hidden}=sigmoid (X_{hidden}) =sigmoid (\begin{bmatrix}1.16\\ 0.42\\ 0.62\end{bmatrix}) =\begin{bmatrix}0.761\\ 0.603\\ 0.650\end{bmatrix}\]

1.3 Hide layer to output layer

Let's visualize the combination of these inputs into the second layer of the hidden layer again to adjust the input. Now that the signal has moved forward to the second layer, the next step is of course to calculate the third layer of output signal \ (x_{output}\)(not yet the output signal of the S function), the calculation method and the previous, no difference, regardless of our network is a few layers, this method applies.


Figure 5.2.3


So, we have:
\[x_{output}=w_{hidden-output} \cdot o_{hidden}=\begin{bmatrix}0.3 & 0.7&0.5 \ 0.6& 0.5&0.2 \ \end{ Bmatrix} \cdot \begin{bmatrix}0.761\\ 0.603\\ 0.650\end{bmatrix}=\begin{bmatrix}0.975\\ 0.888\\ \end{bmatrix}\]
Now, the update shows our progress, starting with the initial input signal, a layer of forward-flowing feedforward signals, and finally the combined input signal of the final layer.


Figure 5.2.4


The final step, of course, is to use the S function to get the last layer of output, denoted by \ (o_{ouput}\) :
\[o_{ouput}=sigmoid (X_{output}) =sigmoid (\begin{bmatrix}0.975\\ 0.888\\ \end{bmatrix}) =\begin{bmatrix}0.726\\ 0.708\\ \end{bmatrix}\]

The flow of the feed-forward signal ends here, Mission accomplished! By visualizing the graph, the signal flow direction in Feedforward neural network, change and so on, we use the Network diagram to display the final shape.


Figure 5.2.5


Undoubtedly, the whole process is the meaning of feedforward, the signal has been flowing forward, the final output, the middle of any layer no signal back to the upper level of the network.
Next we will compare the output values of the neural network with the output values in the training sample, calculate the error and use this error value to reverse adjust the weight value.

2. The essence of Reverse communication

In the previous step we got the output value of the forward propagation as [0.726, 0.708], there is a certain gap between this value and the real value [0.01,0.99], but it doesn't matter, the reverse propagation error will help us to update weights, reduce these errors, let us experiment.

2.1 Calculating the total error

Because the total error is:\ (E=\sum (Target-o_{output}) ^2=e_{1}+e_{2}= (TARGET1-O_{OUTPUT1}) ^2+ (TARGET2-O_{OUTPUT2) ^2\)
Since our experimental network has two outputs, the total error is the sum of two output errors.
First error:
\[e_{1}= (TARGET_{1}-O_{OUTPUT1}) ^2= (0.726-0.01) ^2=0.512656\]
A second error:
\[e_{2}= (Target_{2}-o_{output2}) ^2= (0.706-0.99) ^2=0.079524\]
Total Error:
\[e=e_{1}+e_{2}=0.512656+0.079524=0.59218\]

2.2 Weight updates for hidden layers and output layers

For the weights between the hidden layer and the output layer \ (w_{1,1}\) , if we want to know how much the \ (w_{1,1}\) affects the overall error, we can use the total error on \ (w_{1,1}\) to bias the guide, The bias can be expressed using the chain-of-law.
\[\frac{\partial e}{\partial w_{1,1}}=\frac{\partial e}{\partial o_{ouput1}} \cdot \frac{\partial O_{ouput1}}{\ Partial X_{OUPUT1}} \cdot \frac{\partial x_{ouput1}}{\partial w_{1,1}}\]
The reverse propagation, combined with derivative expressions, can help us understand more clearly how the error is transmitted in reverse.


Figure 5.2.6


Let's evaluate each of the sub-formulas in the above derivative equation separately.
1. First, calculate $\frac{\partial e}{\partial O_{OUPUT1}} $
\[e= (TARGET_{1}-O_{OUTPUT1}) ^2+ (Target2-o_{output2}) ^2\]
\[\frac{\partial e}{\partial o_{ouput1}}=-2 (TARGET_{1}-O_{OUTPUT1}) +0=-2 (0.01-0.726) =1.432\]
2, again to calculate\ (\frac{\partial o_{ouput1}}{\partial x_{ouput1}}\)
\[o_{ouput1}=\frac{1}{1+e^{-x_{ouput1}}}\]
\[\frac{\partial o_{ouput1}}{\partial x_{ouput1}}=o_{ouput1} (1-O_{OUPUT1}) =0.726 (1-0.726) =0.198924\]
3. Final calculation\ (\frac{\partial x_{ouput1}}{\partial w_{1,1}}\)
\[x_{ouput1}=w_{1,1} \cdot o_{hidden1}+w_{2,1} \cdot o_{hidden2}+w_{3,1} \cdot o_{hidden3}\]
\[\frac{\partial x_{ouput1}}{\partial w_{1,1}}=o_{hidden1}=0.761\]
So:
\[\frac{\partial e}{\partial w_{1,1}}=\frac{\partial e}{\partial o_{ouput1}} \cdot \frac{\partial O_{ouput1}}{\ Partial X_{OUPUT1}} \cdot \frac{\partial x_{ouput1}}{\partial w_{1,1}}=1.432 \times 0.198924 \times 0.761 = 0.216777826848\]
We take the study rate\ (\eta=0.5\), using the formula\[{w_{1,1}}_{new}=w_{1,1}-\eta \frac{\partial e}{\partial w_{1,1}}\]
After getting the updated\ ({w_{1,1}}_{new}\)For:\[{w_{1,1}}_{new}=0.3-0.5 \times 0.216777826848=0.191611086576\]
To sum up, you can also calculate\ (\frac{\partial e}{\partial w_{1,1}}\):
\[\frac{\partial e}{\partial w_{1,1}}=-2 (target_{1}-o_{output1}) \cdot o_{ouput1} (1-O_{OUPUT1}) \cdot O_{hidden1 }\]
Therefore, variables that change the above equation can be updated\ (w_{2,1}\),\ (w_{2,1}\),\ (w_{1,2}\),\ (w_{2,2}\),\ (w_{3,2}\)Equal weight value.

2.3 Weight updates for input layers and hidden layers

Calculating the weights between the input and hidden layers is the same as the above method, but when using the error to differentiate the weights, the error should use the total error of two outputs, rather than the error of an input port. We still have a graphical way of showing:


Figure 5.2.7


As shown, the weights between the hidden layer and the output layer\ (w_{1,1}\)Speaking, if we want to know\ (w_{1,1}\)How much the overall error is affected, the total error can be used to\ (w_{1,1}\)The bias Guide can be expressed using the chain law.
\[\frac{\partial e}{\partial w_{1,1}}=\frac{\partial e}{\partial o_{hidden1}} \cdot \frac{\partial O_{hidden1}}{\ Partial x_{hidden1}} \cdot \frac{\partial x_{hidden1}}{\partial w_{1,1}}\]
We are still one of the calculations above the equation.
1, first calculate\ (\frac{\partial e}{\partial o_{hidden1}}\)
For the output of the hidden layer, it accepts errors from two outputs, so:
\[\frac{\partial e}{\partial o_{hidden1}}=\frac{\partial e_{1}}{\partial o_{hidden1}}+\frac{\partial E_{2}}{\ Partial o_{hidden1}}\]
\[\because \frac{\partial e_{1}}{\partial o_{hidden1}}=\frac{\partial e_{1}}{\partial X_{output1}} \cdot \frac{\ Partial x_{output1}}{\partial o_{hidden1}}\]
\[\because \frac{\partial e_{1}}{\partial x_{output1}}=\frac{\partial e_{1}}{\partial O_{output1}} \cdot \frac{\ Partial o_{output1}}{\partial x_{output1}}=1.437 \times 0.198924=0.285853788\]
The following\ (w ' _{j,k}\)Link weights for hidden layers and output layers
\[x_{output1}=w ' _{1,1} \cdot o_{hidden1}+w ' _{2,1} \cdot o_{hidden2}+w ' _{3,1} \cdot o_{hidden3}\]
\[\therefore \frac{\partial x_{output1}}{\partial o_{hidden1}}=w ' _{1,1}=0.3\]
\[\therefore \frac{\partial e_{1}}{\partial o_{hidden1}}=\frac{\partial e_{1}}{\partial X_{output1}} \cdot \frac{ \partial x_{output1}}{\partial o_{hidden1}}=0.285853788 \times 0.3=0.0857561364\]
Calculate again.\ (\frac {\partial e_{2}}{\partial o_{hidden1}}\)
\[\because \frac{\partial e_{2}}{\partial o_{hidden1}}=\frac{\partial e_{2}}{\partial X_{output2}} \cdot \frac{\ Partial x_{output2}}{\partial o_{hidden1}}\]
\[\because \frac{\partial e_{2}}{\partial x_{output2}}=\frac{\partial e_{2}}{\partial O_{output2}} \cdot \frac{\ Partial o_{output2}}{\partial x_{output2}}\]
\[\because x_{output2}=w ' _{1,2} \cdot o_{hidden1}+w ' _{2,2} \cdot o_{hidden2}+w ' _{3,2} \cdot O_{hidden3}\]
\[\therefore \frac{\partial x_{output2}}{\partial o_{hidden1}}=w ' _{1,2}\]
\[\therefore \frac{\partial e_{2}}{\partial o_{hidden1}}=\frac{\partial e_{2}}{\partial X_{output2}} \cdot \frac{ \partial x_{output2}}{\partial o_{hidden1}}=-0.116599104 \times 0.2=-0.0233198208\]
Finally get
\[\frac{\partial e}{\partial o_{hidden1}}=\frac{\partial e_{1}}{\partial o_{hidden1}}+\frac{\partial E_{2}}{\ Partial o_{hidden1}}=0.0857561364-0.0233198208=0.0624363156\]
2. Re-calculation\ (\frac{\partial o_{hidden1}}{\partial x_{hidden1}}\)
\[\because O_{hidden1}=\frac{1}{1+e^{-x_{hidden1}}}\]
\[\frac{\partial o_{hidden1}}{\partial X_{hidden1}}=o_{hidden1} (1-o_{hidden1}) =0.761 (1-0.761) =0.181879\]
3. Final calculation\ (\frac{\partial x_{hidden1}}{\partial w_{1,1}}\)
\[\because x_{hidden1}=w_{1,1} \cdot x_{1}+w_{2,1} \cdot x_{2}+w_{3,1} \cdot x_{3}\]
\[\therefore \frac{\partial x_{hidden1}}{\partial w_{1,1}}=x1=0.9\]
\[\frac{\partial e}{\partial w_{1,1}}=\frac{\partial e}{\partial o_{hidden1}} \cdot \frac{\partial O_{hidden1}}{\ Partial x_{hidden1}} \cdot \frac{\partial x_{hidden1}}{\partial w_{1,1}}=0.0624363156 \times 0.181879 \times 0.9 = 0.01022026918051116\]
We take the study rate\ (\eta=0.5\), using the formula\[{w_{1,1}}_{new}=w_{1,1}-\eta \frac{\partial e}{\partial w_{1,1}}\]
After getting the updated\ ({w_{1,1}}_{new}\)For:\[{w_{1,1}}_{new}=0.9-0.5 \times 0.01022026918051116=0.191611086576=0.89488986540974442\]
The same method can update the values of other weights. In this way, we have completed the introduction of the error back propagation algorithm, in the actual training we continue to iterate through this method, until the total error is close to 0, the best weights are retained, the training is completed.

Reference documents:
1, "Python Neural network Programming"
2, https://www.cnblogs.com/charlotte77/p/5629865.html

Neural network and deep learning--error inverse propagation algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.