When solving the optimization problem with constraints, the Laplace multiplier method and the kkt condition are two very important methods. For the optimization problem of equality constraints, the optimal value can be obtained using the Laplace multiplier method. If there is an inequality constraint, the kkt condition can be used to obtain the optimal value. Of course, the results obtained by these two methods are only necessary. Only when they are convex functions can they be guaranteed to be sufficient. The kkt condition is the generalization of the Laplace multiplier method. When I was studying it, I only knew to apply two methods directly, but I don't know why the conditions of the Laplace multiplier and kkt work. Why do I need to obtain the optimal value like this?
In this article, we will first describe what are the conditions of the Laplace multiplier and kkt, and then start to talk about why we need to find the optimal value in this way.
1. Conditions of the Laplace multiplier method and kkt
The following types of optimization problems are usually solved:
(I) no constraints on optimization issues, which can be written:
Min f (x );
(Ii) an optimization problem with equality constraints can be written as follows:
Min f (x ),
S. T. h_ I (x) = 0; I = 1,..., n
(Iii) An Optimization Problem with inequality constraints can be written as follows:
Min f (x ),
S. T. g_ I (x) <= 0; I = 1,..., n
H_j (x) = 0; j = 1,..., m
For the optimization problem of the (I) class, the commonly used method is the Fermat theorem, that is, to obtain the derivative of f (x), and then let it be zero, you can obtain the optimal candidate value, then, it is verified in these candidate values. If it is a convex function, it can ensure that it is the optimal solution.
For the optimization problem of the (ii) Class, the usual method is the Laplace multiplier method, that is, to constrain the equation h_ I (x) with a coefficient and f (x) it is written as a sub-statement, called the Laplace function, and the coefficient is called the Laplace multiplier. Using the Laplace function to evaluate the derivation of each variable and make it zero, you can obtain a set of candidate values, and then verify and obtain the optimal value.
The kkt condition is often used for optimization issues of category (III. Similarly, we write all equations, inequality constraints, and f (x) into a formula, also known as the Laplace function. The coefficients are also called the Laplace multiplier. Through some conditions, it is a necessary condition for finding the optimal value. This condition is called the kkt condition.
(A) Laplace Multiplier Method)
For equality constraints, we can combine the equality constraints and objective functions into a formula L (A, x) = f (x) + A * h (x) by using a Laplace coefficient ), here we regard a and h (x) as vector forms. A is a horizontal volume, and h (x) is a column vector. This is because csdn is hard to write mathematical formulas, only ......
Then we can obtain the optimal value. We can evaluate the derivation of each parameter through L (A, x) and obtain the equations at the same time. This is explained in advanced mathematics, but I can't explain why I did this. I will briefly introduce my thoughts later.
(B) kkt Conditions
How can we obtain the optimal value for the optimization problem with inequality constraints? The common method is the kkt condition. Similarly, all the inequality constraints, equality constraints, and objective functions are written as a sub-equation L (a, B, x) = f (x) + A * g (x) + B * h (x), The kkt condition indicates that the optimal value must meet the following conditions:
1. l (a, B, x) returns X to zero;
2. h (x) = 0;
3. A * g (x) = 0;
After the three equations are obtained, the optimal candidate values can be obtained. The third formula is very interesting, because g (x) <= 0. To satisfy this equation, a = 0 or g (x) = 0. this is a source of many important SVM properties, such as the concept of SVM.
2. Why can we obtain the optimal value through the conditions of the Laplace multiplier and kkt?
Why can we get the optimal value? Let's talk about the Laplace multiplier method. Suppose our target function z = f (x), X is a vector, and Z gets different values, which is equivalent to a plane (surface) that can be projected on X) as a contour, for example, the target function is f (x, y), where X is a scalar and the dotted line is a contour. Now let's assume that our constraint g (x) is 0, X is a vector. X is a curve on a plane or surface. Assuming that G (x) and the contour line intersect, the intersection is the value of the feasible region that satisfies both the equality constraints and the target function, but it is definitely not the optimal value, because intersection means there must be other contour lines inside or outside the contour lines, so that the intersection value of the new contour lines and the target function is larger or smaller, the optimal value may be obtained only when the contour line is tangent to the curve of the target function, as shown in, that is, the curve of the contour line and the target function must have the same direction in the normal vector at this point, therefore, the optimal value must meet the following requirements: F (x) gradient.
= A * g (x) gradient. A is a constant, indicating the same direction between the left and right sides. This equation is the result of parameter derivation by L (A, X. (I don't know whether the above description is clear. If it is very close to my physical location, contact me directly. I can understand it in person. Note: It is from wiki ).
The kkt condition is a necessary condition for the optimization problem that meets the strong dual condition. It can be understood that we require min f (x), L (a, B, x) = f (x) + A * g (x) + B * h (x), a> = 0, we can write F (x) as: Max _ {, b} l (a, B, x). Why? Because h (x) = 0, g (x) <= 0, it is now the maximum value of L (a, B, X), A * g (x) yes <= 0, so l (a, B, x) can obtain the maximum value only when a * g (x) = 0. Otherwise, the constraints are not met, so Max _ {a, B} l (a, B, x) is f (x) when the constraints are met, so our target function can be written
Min_x Max _ {a, B} l (a, B, X ). If the dual expression: Max _ {a, B} min_x L (a, B, x) is used ), because our optimization satisfies strong dual conditions (strong dual means that the optimal value of the dual formula sub is equal to the optimal value of the original problem), when the optimal value x0 is obtained, it meets the requirements of F (x0) = max _ {a, B} min_x L (a, B, x) = min_x Max _ {a, B} l (a, B, X) = f (x0), let's take a look at what happened to the two formulas in the middle:
F (x0) = max _ {a, B} min_x L (a, B, x) =Max _ {a, B} min_x f (x) + A * g (x) + B * h (x) = max _ {a, B} f (x0) + A * g (x0) + B * H (x0)= F (x0)
We can see that the above blacklist is essentially that min_x f (x) + A * g (x) + B * h (x) has obtained the minimum value in x0, using the Fermat theorem, that is, for the function f (x) + A * g (x) + B * h (x), the derivative is equal to zero, that is
F (x) Gradient + A * g (x) Gradient + B * h (x) gradient = 0
This is the first condition in the kkt condition: l (a, B, x) returns X to zero.
As previously stated, a * g (x) = 0, then the 3rd conditions of the kkt condition, of course, the known condition h (x) = 0 must be met, the Optimal Values of optimization problems that meet strong dual conditions must meet the kkt conditions, that is, the three conditions described above. The kkt condition can be considered as a generalization of the Laplace multiplier method.