from the previous lesson, for a given linear divisible dataset, the closest point to the delimited hyper-plane is the support vector. The farther the distance between the support vector and the separated plane, the more credible the prediction result of the last algorithm. The core of this lesson is how to determine the best separation super plane, the optimal interval classifier.
First, we will introduce the mathematical reasoning, and then introduce the optimal interval classifier.
1, convex optimization problem
Select two points in a function, connect two points into a straight line, the function point between two points is the convex function under this line, the example of convex function has exponential function. When a problem is turned into a convex optimization problem, the problem can be solved well. For the convex optimization problem, the local optimal solution is the global optimal solution.
Given a linearly scalable dataset, we represent the optimization interval problem as:
|| w| | = 1 means that the geometric interval equals the function interval. This also means that the geometry interval is not less than. Then resolve this problem to get the optimal interval. But this is a non-convex optimization problem, can not be solved with standard optimization software.
The problem optimization can be expressed by the relationship between the geometric interval and the function interval:
Then add a zoom condition of W and B. So maximizing is the equivalent of minimizing. So the problem can finally be optimized to express the convex optimization problem.
The optimal interval classifier can be obtained by solving this problem.
2. Lagrange duality
2.1 LaGrand Day operator
The optimization problem of solving the condition limitation can be used Lagrange duality. Suppose a problem is indicated as follows:
So its LaGrand day operator is expressed as:
Each derivative can be zero:
You can then solve the and.
In addition, we can add inequality constraints, the solution of this optimization problem can be as follows. The original problem can be expressed as:
Then its LaGrand day operator is
Hypothesis, represents the original problem, known.
If the condition is not met (or), you can launch
If the condition is met.
So the induction can be:
So the minimization of the original problem is minimized. Therefore, the solution of the original problem can be obtained.
2.2 Duality and Kkt conditions
The definition of duality problem can be explained as follows. Assuming that the dual problem of the original problem is
The dual optimization problem is expressed as:
This is similar to our original question, except that Max and Min are swapped. We define the solution for the dual optimization problem .
because Max min is always not greater than Min Max, when min max = max min, you know that min Max gets the best solution. And when can the solution of the original question (Min max) and its duality problem (max min) be equal?
You just have to meet the kkt conditions.
Assuming that F and all G, h are affine functions, and there are some, then the KKT (karush-kuhn-tucker) condition can be represented as follows:
The parametric solution of the original problem is the parametric solution of the duality problem. It can be seen from the (5) formula that if it is not equal to 0, it is equal to 0.
3. Optimal interval classifier
The optimal interval classifier can be defined as
So set its limit to
So its LaGrand day operator is
The derivation of its factors is obtained by:
Obtained
It is possible to differentiate its factor B by:
The (9) type (8) can be
And then by the (10) type of generation
So the dual optimization problem can be expressed as:
The problem of dual optimization can be obtained, so that the Jiewei of B can be obtained by (9).
For a new data point x, you can make predictions as follows
In this way, the optimal interval classifier is realized.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Stanford "Machine Learning" Lesson7 thoughts ——— 1, the best interval classifier