2 Lagrangian duality (Lagrange duality)
Put aside the above two planning questions first to look at the existence of equality constraints of the extremum problem, such as the following optimization problem:
The objective function is F (w), and the following is the equality constraint. The usual solution is to introduce the LaGrand day operator, which is used here to represent the operator and get the Lagrange formula as
L is the number of equality constraints.
Then, the partial derivative is equal to 0 and then the W sum is solved. As for why the introduction of the LaGrand day operator can find the extremum, because the direction of the DW Change of F (w) is constrained by other inequalities, the change direction of DW is perpendicular to the gradient of F (W) to obtain the extremum, and at the Extremum, the F (w) gradient is parallel to the linear combination of the other equation gradients, so there is a (Refer to "optimization and kkt conditions")
Then we discuss the method of solving the extremum problem with inequality constraint, the problem is as follows:
We define the generalized Lagrange formula
And these are the LaGrand day operators. If this formula is solved, the problem arises because we solve the minimum value, and here is not 0, we can adjust to a large positive value, so that the final function result is negative infinity. So we need to exclude this, we define the following function:
Here P stands for primal. Suppose or, then we can always adjust and to make the maximum value positive infinity. When only G and H satisfy the constraint, it is f (w). The subtlety of this function is that it also asks for maximum value.
So we can write.
So the min F (w) We originally requested can be converted into a request.
We use to express. If solving directly, the first is to face two parameters, but also the inequality constraints, and then the minimum value on the W. This process is not easy to do, then how to do?
Let's consider another issue first.
The meaning of D is duality, which turns the problem into the first Lagrange's minimum value for W, and is regarded as a fixed value. After the maximum value is obtained:
This problem is the duality of the original problem, the order of Min and Max is changed only with respect to the original problem, and the result of the general replacement order is Max Min (x) <= Minmax (x). But here the two are equal. This is used to indicate the duality problem:
The following explains the conditions under which the two will be equivalent. Suppose both F and g are convex functions, H is affine (affine,). and the existence of W makes for all I,. Under this hypothesis, there must be a solution to the original problem and a solution to the duality problem. Also, meet the Kuhn-Tucker condition (Karush-kuhn-tucker, KKT condition), the condition is as follows:
So if the Kuhn-Tucker condition is met, then they are the solution to the original problem and the duality problem. Let's look again at the formula (5), which is called the Kkt dual complementarity condition. This condition implies if, then. That is, when W is at the boundary of the feasible domain, then it is the constraint that works. The other points that are located inside the feasible domain are constraints that do not work. This KKT double complement condition will be used to explain the convergence test for support vectors and SMO.
This part of the content of the idea of a messy, but also need to study the "nonlinear planning" in the constrained extremum, and then look back. The general idea of Kkt is that the extremum is obtained in the feasible domain boundary, that is, inequality is 0 or equality constraints, and the optimal descent direction is generally a linear combination of these equations, where each element is either a constraint with an inequality of 0 or an equality constraint. For points within the bounds of the feasible domain, the optimal solution does not work, so the preceding coefficient is 0.
Optimal interval classifier (optimal margin classifier)
Back to the optimization problem with SVM:
We rewrite the constraints to:
From the KKT condition, only the function interval is 1 (the nearest point of the super plane), the coefficients of the front of the linear constraint, that is, these constraints, for other points not on the line (), the extremum will not be in their range, so the preceding coefficients. Note Each constraint is actually a training sample.
Look at the following figure:
The solid line is the maximum interval super plane, assuming that the X number is a positive example, the circle is a negative example. The point on the dashed line is that the function interval is 1 points, then the coefficients in front of them, the other points are. These three points are called support vectors. The Lagrangian function is constructed as follows:
Notice that there is no equality constraint in the original problem, only inequality constraints.
Let's follow the solution step of the duality problem step by step,
The minimum value to be solved first, for a fixed, minimum value is only related to W and B. The partial derivative is obtained for W and b respectively.
and get
The upper is taken back to the Lagrangian function, at which point the minimum value of the function is obtained (the objective function is the convex function)
After substituting, the simplification process is as follows:
Finally get
Since the last item is 0, it is simplified to
Here we represent the inner product of the vector as
The Lagrangian function at this time contains only the variables. However, we find out that we can get W and b.
Then there's the process of making it big,
There are several conditions that the duality problem and the original problem satisfy, firstly, because the objective function and the linear constraint are convex functions, and there is no equality constraint H. The existence of W makes for all I,. Therefore, there must be a solution to the original problem and a solution to the duality problem. Here, begging is begging.
If we find out, we can find out W (also, the solution of the original problem). And then
You can find the B. That is, the nearest positive function interval from the Hyper plane is equal to the nearest negative function interval from the hyper plane.
How to solve the dual problem above will be left to the SMO algorithm in the next article to clarify.
Consider another problem here, as the previous solution gets
The starting point for our entire consideration of the problem is that, based on the solution, we get the pre-
That is to say, the new sample to be classified first according to W and b to do a linear operation, and then see whether the result is greater than 0 or less than 0, to determine whether a positive or negative example. Now that we don't need to ask for W, we just do the inner product and all the samples from the new sample and training data. Then someone would say, is it too time-consuming to do calculations with all the previous samples? In fact, we get from kkt conditions, only support vectors, other cases. Therefore, we only need a novelty sample and a support vector of the inner product, and then the operation can be. This notation is a good cushion for the kernel function (kernel) to be mentioned below. This is the last article, the first to write so much.
SVM (ii) Lagrange duality problem