today, it is necessary to study the extremum problem of multivariate function in order to find the parameter using Newton iterative method in logistic regression .
Recall, the unary function to find the extremum problem how do we do? For example, in the case of concave functions, First order derivative is obtained,
The derivative at the extremum must be zero, but the point with the derivative equal to zero does not necessarily have an extremum, for example. so there is a need for further judgment,
The function continues to seek second-order derivation because the second derivative is established at the standing point, so
At the point where the minimum value is obtained, the meaning of the second derivative here is to judge the concave and convex function locally.
The method of finding extreme values in multivariate functions is similar, only in judging the convexity here a matrix is introduced, called the Hessian matrix .
If a real-valued multivariate function is second-order in a defined field, then we ask for its extremum, first of all the biased guides, i.e.
get an equation like this
The standing point can be solved by this equation, which is a one-dimensional vector of length. But we just got this residency, actually.
There are 3 types of standing points: local maxima, local minima, and non-extremum values.
So the next thing to do is to judge which of these 3 is the resident point. So the Hessian matrix is introduced, which means it is used to
Judging The convexity of the multivariate function.
The Hessian matrix is a square of the second derivative of a multivariate function, describes the local curvature of the function, and is commonly used in Newton iterative method to solve the optimization problem.
For example, for the above multivariate function, if its second derivative is present, then the Hessian matrix is as follows
If a function is continuously hessian within a defined field, then the matrix is symmetric within the defined field, because if the function is connected
Second derivative, there is no difference in the sequence of derivatives, i.e.
with the Hessian matrix , we can judge the above-mentioned extremum of 3 kinds of cases, the conclusion is as follows
(1) If it is a positive definite matrix, then the critical point is a local minimum value
(2) If it is a negative definite matrix, then the critical point is a local maximum value
(3) If the indefinite matrix, then the critical point is not the Extremum
then continue to learn how to determine whether a matrix is positive, negative, or uncertain.
One of the most commonly used methods is the sequential master style. the necessary and sufficient conditions for a positive definite matrix of a real symmetric matrix is greater than 0 for each order .
Because this method involves the calculation of determinant, it is more troublesome! There is also a method for the real two-matrix matrices, which are described below
the necessary and sufficient conditions for the real two-order matrix to be positive definite two is that the eigenvalues of the matrix are all greater than 0. For a negative two-time type of charging bar
the eigenvalues of the matrix are all less than 0, otherwise it is uncertain.
Lagrange Multiplier method
The Lagrange multiplier method is used to find the extremum of the condition, there are two kinds of extremum problem, one is to find the extremum of the function at the given interval, and the independent variable
No other requirement, this extremum is called unconditional extremum . Second, there are some additional constraints on the independent variables under the limit of the extremum, called
conditional extremum . For example, given an ellipsoid
The maximum volume of the inner box of the ellipsoid is obtained. This problem is actually the conditional extremum problem, that is, the condition
The maximum value to be asked.
Of course, this problem can be eliminated according to the conditions, and then brought into the unconditional extremum problem to deal with. But sometimes it does.
Very difficult, even can not do, this time need to use Lagrange multiplier method . Described below
The conditional extremum that satisfies the function can be transformed into a function.
The unconditional extremum problem. If it is the standing point of the function, it is the suspect point of the conditional extremum.
Back to the above topic, the problem is transformed into a question by Lagrange multiplier method
To obtain a biased derivative
The first three equations are obtained and brought into the solution of the fourth equation.
The maximum volume to be brought into the solution is
The Lagrange multiplier method can also be applied to the conditional extremum problem of general multivariate function under many additional conditions. For example
Title: Find the nearest point and furthest point of the intersection of the rotating parabolic surface and the plane to the origin of the coordinates.
Analysis: set, make all
The partial derivative is zero and gets
The solution was two suspects, respectively.
Because
So, the closest point to the origin is that the farthest point is.
title: The maximum entropy of discrete distributions is obtained.
Analysis: because the entropy of the discrete distribution is expressed as follows
And the constraint is
the maximum value of the function is required, according to the Lagrange multiplier method ,
For all the partial derivative, get
Calculate the differential of this equation and get
It means that all are equal, and the final solution is
Therefore, the maximum entropy value can be obtained by using uniform distribution .
Solving the extremum problem with multivariate function