How the BackPropagation algorithm works
The goal of the reverse propagation is to calculate the partial derivative of the cost function C , respectively, on W and b ? C/?w and ? C/?b.
The core of the reverse propagation is a partial derivative of the cost function C about any weight w(or bias b)? An expression of c/?w. This expression tells us how quickly the cost function changes when changing weights and biases .
Fast computation of output using matrices in neural networks
Concept: weight matrix wlfor each layer L , offset vector bl, activation vector al. Then the activation vectors of Lth and (L-1) th can be connected by equations:
There is an intermediate amount of Zlin this equation:
Called ZL is the right input for L-layer neurons.
Two assumptions about the cost function
1. The cost function can be written as a mean value of the cost function Cx on each training sample x:
Is the reverse propagation actually calculated for a separate training sample ? Cx/?w and ? Cx/?b, and then averaging on all the training tests ? C/?w and ? C/?b.
With this assumption, the cost function cx can be thought of as C.
2. The cost can be written as a function of the neural network output:
For a separate training sample x its two-time cost function can be written:
X, y are fixed parameters and are not changed by weight and bias, meaning that this is not an object of neural network learning, so it is reasonable to treat C as a function with only the output activation value of al .
Four basic equations for reverse propagation
Concept:
ΔJL: The error on the jth neuron of the lth layer.
1. Output error equation:
σ ' (ZJL) requires a little extra computation. ? C/?ajl depends on the form of the cost function, for example: if you use two functions, so.
2. Use the next layer of error δl+1 to calculate the current layer error δL:
3. Cost function about the change rate of arbitrary bias in the network:
4. Cost function change rate for any one weight:
Four equations for reverse propagation:
Inverse propagation algorithm
Application of a gradient descent learning algorithm based on small batch data (m)
Neural Network and deeplearning (2.1) Reverse propagation algorithm