Optimization algorithm and TensorFlow realization of deep learning model

Source: Internet
Author: User

Model optimization is important for both traditional machine learning and deep learning, especially in deep learning, and it is likely that more difficult challenges will need to be addressed during training. At present, the popular and widely used optimization algorithm has a random gradient descent, with the momentum of the random gradient descent, Rmsprop algorithm, with momentum of Rmsprop,adadelta and Adam, and so on, the following will be selected for specific instructions, most copied from the "Deep learning" and TensorFlow official documents. If there is any mistakes, please correct me. Random Gradient descent

Random gradient descent (SGD) is probably the most applied optimization algorithm, and the random gradient descent algorithm is as follows:

Input: Learning rate Εϵ\epsilon, initial parameter Θθ\theta
While stop criterion does not meet
A small batch containing M samples from the training concentrate {x (1),..., x (m) x (1), ..., X (m) x^{(1)}, \ldots, x^{(M)}, where x (i) corresponds to the target y (i) x (i) corresponds to Y (i) x ^{(i)} corresponds to the target of y^{(i)}.
Calculation of gradient estimates: Gˆ←+1m▽θ∑il (f (x (i); θ), Y (i)) G ^←+ 1 M▽θ∑i L (f (x (i); θ), Y (i)) \widehat{g}\gets + \frac{1}{m}\b Igtriangledown_\theta\sum_il (f (x^{(i)};\theta), y^{(i)})
App Update: θ←θ−ϵgˆθ←θ−ϵg ^ \theta \gets \theta-\epsilon\widehat{g}
End While

There is no specific SGD implementation in the TensorFlow, To implement SGD, you can use the simple gradient descent tf.train.GradientDescentOptimizer provided by TensorFlow to feed a randomly selected small batch of data to the model when you run the session.
The init () function parameters of the Tf.train.GradientDescentOptimizer class are as follows:

__init__ (
    learning_rate,
    use_locking=false,
    name= ' gradientdescent '
)

LEARNING_RATE Specifies the learning rate, use_locking Specifies whether to lock the update operation, which is mainly for concurrent read and write issues.
As can be seen from both the algorithm and the INIT function, the learning rate in SGD is fixed, but the fixed learning rate does not serve the model well, and the late training may lead to a sharp concussion of the learning curve. Therefore, it is often necessary to gradually reduce the learning rate over time. TensorFlow provides a variety of decay functions to achieve the attenuation of the learning rate, taking Tf.train.exponential_decay () as an example:

Exponential_decay (
    learning_rate,
    global_step,
    decay_steps,
    decay_rate,
    Staircase=false,
    Name=none
)

The decayed_learning_rate returned by this function is the attenuation based on the input learning_rate.

Decayed_learning_rate = learning_rate * decay_rate ^ (global_step/decay_steps)

Where Global_step represents the current number of times, decay_steps represents the attenuation step, staircase is true, Global_step/decay_steps takes an integer, the return of the learning rate is a ladder-shaped decay, If the staircase is false, the learning rate is linearly attenuated. The returned decayed_learning_rate can be used directly. Adagrad Algorithm

Adagrad algorithm is an adaptive learning rate algorithm, which adapts to the learning rate of all model parameters independently. The algorithm maintains a cumulative squared gradient vector, that is, for each parameter of the model, accumulates the sum of its historical squared values, with its inverse as the weight update parameter, with the following specific rules:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.