Source: Michael Nielsen's "Neural Network and Deep learning", click the end of "read the original" To view the original English.
This section translator: Hit Scir undergraduate Wang Yuxuan
Disclaimer: If you want to reprint please contact [email protected], without authorization not reproduced.
Using neural networks to recognize handwritten numbers
How the inverse propagation algorithm works
Warm-up: A method of fast computing neural network output based on matrix
Two assumptions about the loss function
Hadamard product
Four basic equations behind the reverse propagation
Proof of four basic equations (selected readings)
Inverse propagation algorithm
Inverse Propagation Algorithm Code
Why is it that the reverse propagation algorithm is efficient
Reverse Propagation: Overall description
Learning method of improving neural network
Neural network can calculate visual proof of arbitrary function
Why the training of deep neural networks is difficult
Deep learning
The inverse propagation equation provides us with a method for calculating the cost function gradient. Let's write the algorithm explicitly:
Input: calculates the corresponding activation function value for the input layer.
forward propagation: for each, computed and.
Output Error: calculates the vector.
reverse propagation of errors: for each calculation
output: The gradient of the cost function is the and
Through the above algorithm can see why it is called the reverse propagation algorithm. We start with the last layer and calculate the error vectors backwards. The inverse computational error in a neural network may seem strange. But if we recall the process of proving the reverse propagation, we will find that the process of reverse propagation results from the cost function, which is the function of the output value of the neural network. In order to understand how the cost function changes with the previous weights and offsets, we must repeatedly apply the chain rules and get useful expressions through inverse computations.
Practice
Reverse propagation after modifying a neuron
Suppose we modify a neuron in the forward propagation network so that the output of the neuron is a function of a non-sigmoid function. How should we modify the inverse propagation algorithm in this case?
Reverse propagation of linear neurons
Suppose we replace the usual nonlinear equations in the whole neural network. Re-write the reverse propagation algorithm in this case.
As I have said above, the inverse propagation algorithm calculates the gradient of the cost function for each training sample. In the actual situation, the inverse propagation algorithm is often used in conjunction with learning algorithms such as random gradient descent, in which we need to calculate the gradient of a batch of training samples in the random gradient descent algorithm. Given a small batch (mini-batch) of training samples, the following algorithm gives the gradient descent learning steps based on these training samples:
Enter a set of training samples
for each training sample: set the appropriate input activation value and perform the following steps:
forward propagation: for each, calculate and.
Output Error: calculates the vector.
reverse propagation of errors: on each, calculated.
gradient descent: on each, respectively, according to the law and updating weights and offsets.
Of course, in order to achieve a random gradient drop in practice, you also need an external loop for generating a small batch (mini-batches) training sample and an external loop for stepwise calculation of each iteration. For brevity, these have been omitted.
In the next section we will introduce the "Inverse Propagation algorithm code", so stay tuned!
"Hit Scir" public number
Editorial office: Guo Jiang, Li Jiaqi, Xu June, Li Zhongyang, Hulin Lin
Editor of the issue: Li Zhongyang
Neural network and deep learning series article 14: Proof of four basic equations