machine Learning: A step-by-step approach to understanding reverse communicationTime 2016-09-13 00:35:59 Yong Yuan ' s blog original http://yongyuan.name/blog/back-propagtion.html theme Data mining
In reading the reverse-propagation method, I saw this blog post with the reverse propagation through the example a step by step backpropagation Example, in this blog post, the author gives a simple example of the process of reverse propagation, very clear, Hence, it is helpful to have friends who understand the reverse communication method with their understanding of translation. background
While in the training process of neural network, reverse communication is used so much, but there are few concrete examples to explain how reverse communication works in the Internet. So in this article, I will try to use a concrete example to explain the process of reverse communication, so that the need for friends can use their own calculation process to determine whether the process of reverse understanding is in place.
You can find a reverse-propagating Python implementation code on my Gihub. Overview
In this blog post, we use neural networks with 2 input units, 2 hidden neurons, and 2 output neurons. In addition, the hidden layer and the output neuron contain a bias, and the following is the basic network structure:
To facilitate the instructions in the following directions, we set some initial weights, offsets, and inputs and outputs for the network:
The goal of reverse propagation is to optimize the weights so that neural networks can learn the exact mapping from arbitrary input to output.
In this blog post, we only apply a simple training set, which is input 0.05 and 0.10, we want the network output to be 0.01 and 0.99 (that is, the input sample is two: (0.05, 0.99), (0.10, 0.99)). forward Propagation
Let's take a look at the weight and bias of the given initialization, and what the network does for input 0.05 and 0.10. We input into the network.
We first compute each neuron from the input of the entire network to the hidden layer, we use the logistic function, and we repeat this process from the hidden layer to the output layer.
All network input is also called network input derivation of backpropagation
The following is an input calculation process for all H1H1 network input:
(Translator Note: The analogy to the CNN network, this process is convolution process, get the feature response graph)
We then enter it into the activation function to get the output h1h1:
(Translator Note: The analogy to the CNN network, the process feature response graph is activated by the process of function operation)
For H2H2 through the same process above, we can get:
For the input layer neurons, the output of the hidden layer as input (translator Note: In CNN, we also need to go through the pool to be the next level of input, why the need to pool the problem thrown out, the translator does not explain), repeat the same process above, we can get:
Similarly, repeat the same process above, we can get O2O2: calculate the total error
Now for each neuron we output, we use the sum of squared error functions to calculate the total error:
Output is our forecast label, and Target is Groundtruth. 1212 so that we can cheat to eliminate the 2, does not affect the model parameters of the results of the solution.
For the first neuron the output o1o1 true value is 0.01, and the network output is 0.75136507, so the first neuron output error is:
Repeat the above process, you can get the output of the second neuron O2O2:
So the error sum of the whole neural network is: Reverse propagation
The goal of the reverse propagation is: by updating every weight in the network, the final output is close to Groundtruth, thus the error of the whole network is minimized as a whole. Output Layer
We first examine w5w5, and we want to know how much the w5w5 changes can affect the total error, that is ∂etotal∂w5∂etotal∂w5.
By using chain rules, we can get:
To make it more intuitive to describe the chain of the above process, we visualize it:
We calculate each of the items from the chain rule above. First, the overall error about the output of each neuron changes.
The partial derivative of the logistic function is the output multiplied by 1 minus the output, namely:
Finally, the entire network input o1o1 about how much w5w5 changed.
You will also see the form expressed in the Delta rule:
We can put ∂etotal∂out