In the deep network, the learning speed of different layers varies greatly. For example: In the back layer of the network learning situation is very good, the front layer often in the training of the stagnation, basically do not study. In the opposite case, the front layer learns well and the back layer stops learning.
This is because the gradient descent-based learning algorithm inherently has inherent instability, which causes the learning of the front or back layer to stop.
Vanishing gradient problem (the vanishing gradient problem)
In some deep neural networks, the gradient tends to be smaller when the hidden layer is propagated backward, that is, the learning speed of the hidden layer is slower than the hidden layer behind . This is the problem of vanishing gradients .
In the other case, the gradient of the hidden layer in front of you will become very large, that is, the previous hidden layer learns faster than the hidden layer behind it. This is called the problem of the gradient of the explosion .
In other words, gradients in deep neural networks are unstable, either disappearing in the front layer or exploding.
Causes of unstable gradient problems
The gradient on the front layer is the product of the items from the back layer, and when there are too many layers, there is an inherently unstable scene.
It is generally found that the gradient of the front layer in the sigmoid network disappears exponentially.
Neural Network and Deeplearning (5.1) Why deep neural networks are difficult to train