The optimization algorithm of neural network to choose __ algorithm

Source: Internet
Author: User
Tags keras
Optimization Algorithm

To solve the optimization problem, there are many algorithms (the most common is gradient descent), these algorithms can also be used to optimize the neural network. Each depth learning library contains a large number of optimization algorithms to optimize the learning rate, so that the network with the fastest training times to achieve optimal, but also to prevent the fit.
Keras provides such optimizer [1]: SGD: Random gradient descent sgd+momentum: Momentum based SGD (optimized on SGD) Sgd+nesterov+momentum: SGD based on momentum, two-step update (in sgd+ Momentum based on optimization) Adagrad: adaptively allocate different learning rates for each parameter Adadelta: optimized algorithm (optimized on Adagrad basis) for Adagrad problem Rmsprop: For cyclic neural networks (Rnns) is the best optimizer (optimized on a adadelta basis) Adam: COMPUTE adaptive learning rates for each weight (optimized on a rmsprop basis) Adamax: How to choose an optimized algorithm for Adam (optimized on Adam basis)

There are so many optimization algorithms, so how do we choose it. The great God has given us some advice [2][3] If you have a small amount of data input, choose an adaptive learning rate method. This way you don't have to tune the learning rate, because your data is small, and NN learning is a little time-consuming. In this case you should be more concerned about the accuracy of network classification. Rmsprop, Adadelta, are very similar to Adam and perform well in the same situation. Bias checking makes Adam's effect a little better than Rmsprop. The sgd+momentum algorithm with good parameter tuning is better than Adagrad/adadelta

Conclusion: Until now (2016.04), if you don't know which optimization algorithm to choose for your neural network, simply choose Adam. (Insofar, Adam might be the best overall choice.) [2]) reference [1] keras optimization algorithm, http://keras.io/optimizers/[2] Gradient descent optimization summary, http://sebastianruder.com/ optimizing-gradient-descent/[3] mnist the optimal conclusion on the dataset, http://cs.stanford.edu/people/karpathy/convnetjs/demo/ trainers.html [4] http://blog.csdn.net/luo123n/article/details/48239963

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.