In slam, there are still a lot of applications about gradient descent in machine learning, but it is not very clear about the concepts such as "reverse gradient direction is the fastest direction for partial decline of function values, some of your understanding is attached below.
Glossary:
Gradient:
A gradient is a vector that indicates that the direction derivative of a function at this point is maximized along this direction, that is, the function is the fastest changing along the gradient direction at this point.
Located in the plane area D with a first-order continuous partial derivative, is a unit vector, if the following limit value exists
This derivative is recorded
The limit value is the derivative along the direction, so we can find the direction derivative in any direction.
The simplified calculation is as follows:
Set,
Then we can get:
(For the angle between the vector and the vector)
At this time, if we want to obtain the maximum value, that is, when it is 0 degrees, that is, the vector (this direction is always changing, and we are looking for the fastest change direction of a function) when it is parallel to a vector (when the point in this direction is fixed, it is fixed), the direction derivative is the largest. the biggest derivative of the direction is the unit step, and the function value changes the fastest in the reverse direction.
The fastest way to drop function values is in the same direction as the vector. so now I name vector A as a gradient (when a point is determined, the gradient direction is determined), that is, why the gradient direction is the biggest direction of the Function Change Rate !!!
Then, you can find the fastest way to decrease the gradient.
The reverse gradient direction is the fastest possible direction for local decline of function values.