Deep Learning Small Trick collection _ Find a job

Source: Internet
Author: User
Solution of gradient vanishing/gradient explosion

Firstly, the fundamental reason of gradient vanishing and gradient explosion is the inverse propagation algorithm based on BP




And the above reverse propagation error is less than 1/4

In general, when you update W and B, the updated step is proportional to the learningrate, when the lower the number of layers, the value of W of each layer and the value of the reverse propagation error multiply, resulting in the W and b update step size is greatly affected, resulting in gradient explosion or gradient disappeared. At this time the depth of the network can not be better than the performance of the thousand-tier network. At the back of the basic learning situation is good, and the shallow network can not learn things. The exponential gradient disappears in the sigmoid network.
There are probably several strategies for this.
Each layer of the network to learn at different rates of learning
Replace activation function
Using the Relu activation function, simplifying the calculation and solving the gradient vanishing problem, and the output of some neurons is 0, it can make the network sparse, reduce the dependence of parameters, and mitigate the occurrence of the fitting.
Using Batch Normolization
The data should be normalized before training the network. The purpose is: the essence of neural network is learning data distribution, if you find your data and test data distribution, network generalization ability will be reduced, and in each batch of training data, the network training speed will be reduced.
The batch normolization can solve the problem of gradient disappearance, which makes the weight change of different layers different scale more consistent, and can speed up the training. Precede the non-linear mapping of each layer of the network before activating the function. What to do with local optimal solution

Simulated annealing
Add momentum not convergent what's going on, how to solve

Too little data
Learningrate too big
may lead to not converge from the beginning, each layer of w is very large, or running loss suddenly become very large (generally because the network before using Relu as the activation function and the last layer using Softmax as a classification of functions caused)
Bad network structure
Replace the other optimization algorithm
I ran into the test once, Adam did not converge, with the simplest SGD convergence. Specific reasons unknown
Normalization of parameters
is to normalized the input to a mean 0 variance of 1 and then use bn, etc.
Class
To modify an initialization scheme to fit

Increase the amount of data in the training set
The words of the image can be translated and reversed plus noise
Using the Relu activation function
Dropout
Each iteration training randomly selects a portion of nodes for training and weight updating, while the other part weights remain unchanged. When testing, the output is obtained using the mean network network.
Regularization
Also to simplify the network, add the L2 norm
Early termination of training how to improve the effect. /Maybe there is a lack of fit

Increase the number of features
The original input is only coordinate position, and then add a color feature after the fitting
Study on reducing parameter initialization of regularization term

http://m.blog.csdn.net/shwan_ma/article/details/76257967 Optimization method

Http://www.cnblogs.com/wuxiangli/p/7258510.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.