-Learning Rate
In the gradient descent algorithm, the number of iterations required for the algorithm convergence varies according to the model. Since we cannot predict in advance, we can plot the corresponding graphs of iteration times and cost functions to observe when the algorithm tends to converge.
Of course, there are some ways to automatically detect convergence, for example, we compare the change value of a cost function with a predetermined threshold, such as 0.001, to determine convergence. But in general, it is more intuitive to observe the above chart.
Each iteration of the gradient descent algorithm is affected by the learning rate, and if the learning rate is too small, the number of iterations required to converge is very high, and if the learning rate is too large, each iteration may not reduce the cost function and may cross the local minimum to cause an inability to converge.
As a result, the learning rate that we are trying to get is usually: ..., 0.001,0.003,0.01,0.03,0.1,0.3,1, ...
Coursera Machine Learning Study notes (10)