This article is reproduced from https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
Background
BackPropagation is a common method for training a neural network. There is no shortage of papers online The attempt to explain what backpropagation works, but few this include an example W ith actual numbers. This post was my attempt to explain how it works with a concrete example, folks can compare their own calculations to I n order to ensure they understand backpropagation correctly.
If This kind of thing interests you, you should sign up for my newsletter where I post about ai-related projects that I ' m Working on.
BackPropagation in Python
You can play around with a Python script which I wrote that implements the BackPropagation algorithm on this Github repo.
BackPropagation Visualization
For the interactive visualization showing a neural network as it learns, check out my neural network visualization.
Additional Resources
If you find the tutorial useful and want to continue learning about neural networks and their applications, I highly reco Mmend checking out Adrian rosebrock ' s excellent tutorial on Getting Started with deep learning and Python.
Overview
For this tutorial, we ' re going the use of a neural network with the inputs, the hidden neurons, and the output neurons. Additionally, the hidden and output neurons would include a bias.
Here ' s the basic structure:
In order to has some numbers to work with, here is the initial weights, the biases, and training inputs/outputs:
The goal of backpropagation is to optimize, the weights so, the neural network can learn how to correctly map arbitrary Inputs to outputs.
For the rest of this tutorial we ' re going to work with a single training set:given inputs 0.05 and 0.10, we want the Neur Al network to output 0.01 and 0.99.
The Forward Pass
To begin, lets see what the neural network currently predicts given the weights and biases above and inputs of 0.05 and 0. The those inputs forward though the network.
We figure out the hidden layer neuron, squash The total net input using a A ctivation function (here we use the logistic function) and then repeat the process with the output layer neuron S.
Total net input was also referred to as just
NET inputby some sources.
Here's how we calculate the total net input for:
We then squash it using the logistic function to get the output of:
Carrying out the same process for we get:
We Repeat this process for the output layer neurons, using the output from the hidden layer neurons as inputs.
Here's the output for:
and carrying out of the same process for we get:
Calculating the total Error
We can now calculate the error for each output neuron using the squared error function and sum them to get the total error :
Some sources refer to the target as the
Idealand the output as the
actual. The is included so exponent are cancelled when we differentiate later on. The result is eventually multiplied by a learning rate anyway so it doesn ' t matter that we introduce a constant here [1].
For example, the target output for was 0.01 but the neural network output 0.75136507, therefore it error is:
Repeating this process for (remembering, the target is 0.99) we get:
The total error in the neural network is the sum of these errors:
The Backwards Pass
Our goal with backpropagation are to update each of the weights in the network so, they cause the actual output to be C Loser the target output, thereby minimizing the error for each output neuron and the network as a whole.
Output Layer
Consider. We want to know what much a change in affects the total error, aka.
is read as "the partial derivative of with respect to". You can also say "the gradient with respect to".
By applying the chain rule we know that:
Visually, here's what we ' re doing:
We need to the figure of each piece in this equation.
First, how much does the total error change with respect to the output?
is sometimes expressed as if we take the partial derivative of the "the total" error with respect to, the quantity becomes Z Ero because does not affect it which means we ' re taking the derivative of a constant which is zero.
Next, how is much does the output of the change with respect to their total net input?
The partial derivative of the logistic function is the output multiplied by 1 minus the output:
Finally, how much does the total net input of a change with respect to?
Putting it all together:
You'll often see this calculation combined in the form of the Delta rule:
Alternatively, we have and which can is written as, aka (The Greek letter Delta) aka the node Delta. We can use the rewrite the calculation above:
Therefore:
Some sources extract the negative sign from so it would be written as:
/* The gradient of each weight equals the output (that is) of the previous node that is connected to it (that is ) multiplied by the back-propagation output ( that is, the) */
To decrease the error, we and subtract this value from the current weight (optionally multiplied by some learning rate, E TA, which we ' ll set to 0.5):
Some sources use (alpha) to represent the learning, others use (ETA), and others even use (epsilon).
We can repeat this process to get the new weights, and:
We perform the actual updates in the neural network after we have the new weights leading into the hidden layer n Eurons (ie, we use the original weights, not the updated weights, when we continue the backpropagation algorithm below).
Hidden Layer
Next, we ll continue the backwards pass by calculating new values for,,, and.
Big picture, here's what we need into figure out:
Visually:
We ' re going to use a similar process as we do for the output layer, but slightly different to account for the fact that T He output of each hidden layer neuron contributes to the output (and therefore error) of multiple output neurons. We know that affects both and therefore the needs-to-take into the consideration it effect on the both output neurons:
Starting with:
We can calculate using values we calculated earlier:
Equal to:
Plugging them in:
Following the same process for, we get:
Therefore:
Now the We have the we need to the figure out and then for each weight:
We calculate the partial derivative of the total net input to with respect to the same as we do for the output neuron:
Putting it all together:
You might also see this written as:
/* The gradient of each weight equals the output of the previous node connected to it (i.e., i1) multiplied by the back-propagation output (i.e., δh1, which is the key for the first layer of the layer to be calculated) */
We can now update:
Repeating this for, and
Finally, we ' ve updated all of our weights! When we are fed forward the 0.05 and 0.1 inputs originally, the error on the network is 0.298371109. After this first round of backpropagation, the total error was now down to 0.291027924. It might not seem like much, but after repeating this process, for example, the error plummets to 0.000035085 . At here, when we feed forward 0.05 and 0.1, the outputs neurons generate 0.015912196 (vs 0.01 target) and 0.9840 65734 (vs 0.99 target).
Summarize:
1, each weight of the gradient is equal to its connected to the output of the previous layer of nodes multiplied by the back layer of its connection to the output of the reverse propagation, the important conclusion said three times!
2, the new weight = The original weight-* (the total deviation of the gradient value of the weight), such as
3. Reference Blog: http://blog.csdn.net/zhongkejingwang/article/details/44514073
BP algorithm Demo