Gradient descent is an effective method for finding the minimum value of the cost function in a regression problem. For training sets with large data volumes, the gradient descent is effective.
This method is better than the non-iterative normal equation method.
When using it for multi-variable regression, there are two issues to note. Otherwise, the convergence speed may be small, or even unable to converge.
1. Feature Scaling)
When there are many features, we need to use the mean and range of each feature to normalize each feature to the range of [-0.5, 0.5].
That is, f_normed = (f-f_average)/(f_max-f_min)
In this way, the gradient graph of the cost function can become a circle, thus accelerating the convergence speed.
2. Learning Rate Chooseing)
Learning speed a needs to be selected. You can draw out the Cost Function values of different a as they change with the number of iterations.
A small value of a will lead to slow convergence, and a large value of a will lead to divergence (and a small value may lead to slow convergence ).
A's selection scheme is: take 10 as an interval, and the center and two ends of each interval take points. For example:..., 0.01, 0.03, 0.1, 0.3, 1, 3, 10 ,...
Notes for using Gradient Descent to calculate the minimum value in multi-variable linear regression