This article is the Adam method for the Deep Learning series article. The main reference deep Learning book.
Complete list of optimized articles:
Optimal method of Deep Learning
SGD Deep Learning Optimization method
Momentum (momentum) of Deep Learning optimization method
The Nesterov of Deep Learning optimization method (Newton Momentum)
The Adagrad of Deep Learning optimization method
The Rmsprop of Deep Learning optimization method
Adam of Deep Learning optimization method
The first conclusion:
The 1.Adam algorithm can be considered as a modified momentum+rmsprop algorithm.
2. Direct integration of momentum into gradient first-order moment estimation (exponential weighting)
3.Adam is generally considered to be quite robust for the selection of parameters.
4. The study rate is recommended at 0.001.
Then look at the algorithm: in fact, is the combination of Momentum+rmsprop, and then revise its deviation.