Small talk about derivative, gradient and extremum
[reprint please specify source] Http://www.cnblogs.com/jerrylead
Remember that when you do math in high school, you often ask for the tangent of the curve. See the shape such as the function, regardless of the 3,721 direct derivation, this is the slope of the tangent line, and then get the tangent of the place.
The university also studied the surface tangent and normal vector of the method, the deviation is a normal vector, and then set the formula to find the tangent.
A classic example is as follows:
(From a "Geometry app" ppt on the Web)
where the vector n is the partial derivative of f (x, Y, z).
However, the two seek the law seemingly irrelevant ah, in the obtained is tangent, but the following deviation is a normal vector, why are the derivative, so much difference? Why is the equation of the tangent plane related to the normal vector?
Of course, the question and answer of these questions can be completed by rigorous mathematical deduction. Here we want to make sense from a more straightforward point of view.
First, the normal vector (gradient) is f (x) (where x={x0,x1,x2,... xn} is an n-dimensional vector), which represents the rate of change of f (x) in each direction, and the entire normal vector is the vector of f (x) that is superimposed on the rate of change in all directions. For a one-dimensional F (x) =, the derivative on the X is 2x, meaning that in the x direction is at 2x speed change, such as when x=2, f (x) change rate of 4 is greater than when x=1 (change rate is 2) change rate, the direction of the normal vector can only be x direction, because F (x) is one dimension. F (x) Here is called the implicit function, as we normally use the implicit function can be expressed as f (x, Y) =f (×)-Y, so that in fact, F (x, Y) is two-dimensional. As for why the derivative is the rate of change, it can be known by the definition of the derivative (how much dy changes caused by the tiny DX change).
So we understand that the normal vector of the hidden function f (x) is the vector of the partial derivative of the F (x) for each component. So why is the tangent, not the normal vector? Actually, we can't get confused. The Hidden function f (X) and. An implicit function is a function whose value differs depending on the value of X. Instead of just the constraint relationship between x and Y, such as establishing X-y coordinates, the constraints of the two can be represented by graphs (lines, curves, etc.). For example, we can be used to represent a parabola, and can be drawn in the X-y coordinate system. The substitution of the implicit function means that f (x, y) =, only if f (x, y) equals a given value (for example, 0 o'clock), it is a parabola, otherwise it is just a function, if z instead of f (x, y), then f (x, y) is actually a surface, the dimension rose by 1. The result of our partial derivative of f (x, y) is actually the rate of change in the value Z of f (x, y).
Shows how much the value of f (x, y) will change in the small range of (x, y), which is determined by the linear combination of the tiny transformation in the x direction dx and the small transformation dy in the Y direction, and their coefficients are partial derivatives. Replacing DX and dy with the unit vector I and J is the normal vector. The gradient also reflects the rate of change and direction of transformation of F (X) at a certain point.
That's a bit of a detour, in short, for a hidden function f (x), we want to know the direction and size of the change in F (x) near the given X. How to portray? Since the rate and direction of change in each direction (X0,X1,X2...XN) of X are different (for example, in the x0 at the square level, on the X1 in a linear manner, depending on the specific expression), and we want to know how they overlap in a piece of how the change. We use the full differential formula (such as above, we can know that the superposition coefficient between them is a partial derivative, the superposition result is the rate of change, and the direction is x0,x1,x2 ... The corresponding direction of change i,j,k ... The direction of the resulting linear combination.
Back to why "is the tangent" problem, in fact, this is the final conclusion, is deduced. The first step is to write the implicit function (where x, Y is a real number, and the above is a vector).
Then ask for the bias of f to x =
The bias of F to Y is 1.
That is, the gradient is
Since the tangent and normal vectors are perpendicular, the tangent and normal vector inner product is 0.
Set the tangent direction vector to (m,n), then, that is.
Visible, tangent slope is.
Back to the blue image above the surface of the tangent plane problem, find a point of the normal vector, at the point of the tangent plane to meet two conditions, one is to cross the pointcut, but to reflect the direction of the change of the point (here is not the point F (X) value of the direction of change, but the point of its own direction of change). However, the change in this point ultimately reflects the change in the f (X) value of the point, i.e. the change of the tangent plane reflects the change of the normal vector, and the partial derivative reflects the change of the F (X) value. So the partial derivative of tangent plane is the same as the partial derivative of f (X). We see from the blue picture that the tangent plane takes advantage of the partial derivative of f (X).
With the full differential formula above, we can better understand the extremum, why it is often said that the function obtains the extremum when the derivative is 0. Suppose a one-dimensional case, bar, requires a minimum, on both sides of the differential, when the x=0, the derivative 2x is 0, to obtain the extreme value. Otherwise, if X is a positive number, then the DX simply adjusts to the left (dx<0) to make the F (x) value smaller, and if X is negative, the DX simply adjusts to the right (dx>0) to make f (x) smaller. So the final adjustment result is x=0. For two-dimensional cases,
The value of the calculation will be positive negative, but we should note that the DX can be negative, dy can also be negative, as long as there is a not 0, then by adjusting the sign of the Dx,dy (that is, how to move x and Y) can make the value become larger and smaller. Only in cases where the partial derivative is 0, the dx and dy are adjusted in any case, both 0 and the extremum is obtained.
The above is only some simple understanding, the purpose is to build perceptual knowledge, there will be some flaws.
"Reprint" Small talk about derivative, gradient and extremum