Original link Address: http://blog.csdn.net/u012759136/article/details/52302426
This article only to some common optimization methods for the visual introduction and simple comparison, the details of various optimization methods and formulas have to
is:Extended ReadingIn addition to the three types of SGD variants mentioned above, the researchers also suggested other methods:1. Nesterov accelerated Gradient: Expands the momentum method, along the inertial direction, calculates the future possible position gradient instead of the current position gradient, this "advance quantity" design lets the algorithm have the ability to pre-contract to the front environment.2. Adadelta and Rmsprop: These two methods are very similar and are improvement
correct, which can approximate the unbiased estimation of the expectation. It can be seen that directly to the gradient of the moment estimation of memory no additional requirements, and can be dynamically adjusted according to the gradient, and the learning rate to form a dynamic constraint, and have a clear range.Characteristics:
Combines the advantages of Adagrad's ability to handle sparse gradients and rmsprop to handle non-stationary targets
Low memory requirements
Calcula
. In this paper, Adam algorithm and other similar algorithms are discussed. We analyze the theoretical convergence of Adam algorithm and provide the convergent interval, we prove the convergence speed on-line convex optimization framework to achieve the optimal. The empirical results also demonstrate that the ADAM algorithm is in practice comparable to other stochastic optimization methods. Finally, we discuss Adamax, a variant of Adam based on Infini
weight (optimized on a rmsprop basis) Adamax: How to choose an optimized algorithm for Adam (optimized on Adam basis)
There are so many optimization algorithms, so how do we choose it. The great God has given us some advice [2][3] If you have a small amount of data input, choose an adaptive learning rate method. This way you don't have to tune the learning rate, because your data is small, and NN learning is a little time-consuming. In this case you
Optimization Algorithm
To solve the optimization problem, there are many algorithms (the most common is gradient descent), these algorithms can also be used to optimize the neural network. Each depth learning library contains a large number of optimization algorithms to optimize the learning rate, so that the network with the fastest training times to achieve optimal, but also to prevent the fit.Keras provides such optimizer [1]: SGD: Random gradient descent sgd+momentum: Momentum based SGD (op
Authorized translation of the anthology published
DL and Keras Related:Activation function Guidance in "1" depth learningDiscussion on the problem of over-fitting of "2" depth network"3" How to improve Deep Learning performanceSummary and comparison of the most complete optimization methods for "4" depth learning (Sgd,adagrad,adadelta,adam,adamax,nadam)"5" keras/python depth learning in the grid search Super parameter tuning (with source code)"6" Yos
Contact Us
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.