-Gradient descent
The gradient descent algorithm is an algorithm for calculating the minimum value of a function, and here we will use the gradient descent algorithm to find the minimum value of the cost function.
The idea of a gradient descent is that we randomly select a combination of parameters and calculate the cost function at the beginning, and then we look for the next combination of parameters that will reduce the value of the cost function.
We continue this process until a local minimum (local minimum), because we have not completely tried to complete the combination of all parameters, so we can not determine whether we get the local minimum value is the global minimum (global minimum), And choosing a different combination of parameters, we may find different local minimum values.
For the gradient descent algorithm (Gradient descent algorithm) formula:
It is worth noting that Alpha is the learning rate (learning), which determines the size of the step down in the direction of the most significant descent of the cost function.
Note: Even if the learning rate is fixed, the gradient drop will converge to the local minimum point. Also, as we approach the local minimum, the gradient drop will automatically shrink the stride, so we don't need to reduce the learning rate over time.
In batch gradient descent, we simultaneously let all parameters minus the learning rate multiplied by the derivative of the cost function.
Coursera Machine Learning Study notes (vi)