Gradient Descent Practical Tips II Learning rate Gradient descent in Practice II--learning rates
The learning rate in the gradient descent algorithm (learning rates) is difficult to determine, and here are some practical tips to look for. First look at how to determine your gradient descent algorithm is working properly: it is generally necessary to draw an image between the cost function and the number of iterations, as shown in. If the number of iterations increases continuously, then the gradient descent algorithm works well, when the number of iterations is basically flat, the explanation has been convergent, can be selected as a parameter at this time. The general selection of a number less than, if the magnitude of the decline is less than the convergence is considered.
The following is a judging method and solution for judging the abnormal gradient descent: as shown, the gradient descent algorithm works abnormally when the number of iterations increases instead. The general solution is to reduce the value of the learning rate. If the learning rate is too large, it will appear as shown on the right, the magnitude of the descent is too large, skipping the global minimum. The problem of graphics shown below, the solution is also to reduce the value of learning rate.
the value of the learning rate: From above we can know that when too large, may not drop the anti-liter. The value that needs to be narrowed, but if the value is too small, it can result in very slow convergence. Because it's important to choose the right one. Gives the selected method.
Gradient Descent Practical Tips II Learning rate Gradient descent in Practice II--learning rates