If you want to reprint the original, please specify the source
Read Catalogue
I. What is a function interval?
Two. What is a geometric interval?
Three. What is the relationship between the function interval and the geometric interval?
Four. Maximum hard interval
Five. Dual Algorithm for learning
I. Function Interval
In Figure A,b,c Three, A is farthest from the hyper-plane, so the probability of a being classified as wrong is minimal, whereas the distance from the hyper-plane is the most recent, so C is most likely to be classified incorrectly, which is well understood. Then we can use "a point distance over the plane" to indicate the degree of certainty of categorical predictions
So we just need to look for a super plane that is farthest from all the edge points.
A. The absolute value we use represents the distance between the point X and the super plane.
B. For sample point x, Y is its label (classification result +1 or-1)
C. Representation of the predicted classification results
The product of D.Y to represent the correctness of the classification and the degree of certainty just mentioned. This is the function interval.
Description :
A: Taking two-dimensional space as an example, the linear equation is changed to +c=0 ax+by+c=0.
The usage vector w instead (A, b) is represented by the feature vector x, which is the formula.
B: For the straight line we learned. By substituting the negative instance point (0,0), we can see that the calculated result +1 is inconsistent with its label-1, so the discrete hyper-plane learned by the linear separable support vector machine is inaccurate, that is, when Y () >0, it proves that the predicted result is consistent with the label, and the contrary is inconsistent.
C: Then the function interval y () =-1, where the minus sign indicates a classification error, the absolute value represents the distance from the point X to the line, and the distance is the degree of certainty. So say Y () to represent the correctness and degree of certainty of the classification.
1. Define the function interval:
For a given training dataset T and Hyper Plane (W,B) define a super-planar (w,b) function interval for sample points (xi, Yi):
Defining the Super Plane (w,b) The function interval for the training dataset T is the Super plane (w,b) minimum function interval for all sample points (xi, Yi) in t
2. Disadvantages of function interval:
|| The distance between the point x and the plane is inaccurate, as long as the proportional change is w,b. For example, change to 2w,2b. The hyper-plane does not change, but the function interval is twice times the original.
Description
For example, the W: (B=1) and the twice-fold expansion at the same time, when the (the) into the equation, the result is 2. The result of substituting the original equation is 1. However, the position of the line is not actually changed.
Therefore, you can constrain the normal vector w that separates the super plane, such as normalization, let | | w| | =1, this time the function interval is determined. This makes the function interval a geometric interval.
two. Geometry intervalfor a given training dataset T and Hyper-plane (w,b), define the Hyper-plane (w,b) about the geometry interval of the sample points as
Defines the Super plane (w,b) The geometric interval for the training dataset T is the ultra-planar (w,b) minimum of the geometric interval for all sample points in T, i.e.
The geometric interval of the w,b about the sample points is generally the signed distance of the instance point to the Super plane (signed distance), and the distance from the instance point to the super-plane when the sample point is correctly classified by the hyper-plane.
three. Relationship between function interval and geometric interval
From the definition of the function interval and the geometric interval, the function interval and the geometric interval have the following relationship:
if | | w| | =1, then the function interval and the geometric interval are equal. If the hyper-plane parameter W and b are proportionally changed (the superelevation plane does not change), the function interval is also changed at this scale, and the geometric interval is unchanged.
four. Maximum hard interval
How can I determine the linear sub-plane?
Visually, the hyper-plane should be the best fit to separate two types of data. The criterion for determining the "best fit" is that the data on both sides of the hyper-plane is the most spaced. So, look for the super plane with the largest interval.
1. Spacer Edge:
Notice that the two dashed lines are parallel, and the distance between the two dashes is called the gap (margin) to 2/| | w| |, two dashed lines are called interval boundaries.
2. Maximum interval separation of the super-plane:
Objective function: (geometric interval)
Constraint conditions:
Description: A constraint indicates that the distance from each sample point to the hyper-plane is at least
Considering the relationship between the geometric interval and the function interval can be rewritten as
Objective function: (function interval divided by | | w| | )
Constraint conditions:
The value of the function interval does not affect the solution of the optimization problem, because proportional changes to the W and b objective functions and constraints are not affected, so we can make the function interval of 1.
The target function becomes 1/| | w| |, because let 1/| | w| | Maximize, equivalent to let the denominator | | w| | Minimize, for future derivative convenient, put 1/| | w| | Minimization of the maximum equivalence.
So we get the optimization problem of linear scalable support vector machine learning:
Five. Dual Algorithm for learning
Lagrange duality: In constrained optimization problems, Lagrange duality is often used to transform the original problem into duality, and the solution of the original problem is obtained by solving the duality problem. Thus, the calculation is simple and the kernel function is introduced, and then the nonlinear classification problem can be generalized.
In order to solve the optimization problem of linear variational support vector machines, we use Lagrange duality as the original problem to obtain the optimal solution. This is the dual algorithm of the linear scalable support vector machine.
Description
1. Original Questions
Fake Set is the continuous function of the functions defined on them (because of the derivation of them), consider the constrained optimization problem:
Target function:
Constraint conditions:
is called the original problem of constrained optimization problems.
If there are no constraints we can calculate the optimal solution by derivation, then we will try to remove the constraints, it is natural to think of the Lagrange function (because the Lagrange function is to do this):
Introduction of generalized Lagrangian functions (Generalizedlagrange function):
(Here you need to mention that if the constraint is an equation, use the standard Lagrange function).
Here, which is the Lagrange multiplier
Special requests
Here's the original question:
So how to solve the duality problem?
Step One:
Firstly, Lagrange's function is constructed to introduce Lagrange multipliers for each constraint condition of the optimization problem: i=1,2, ... N
We rewrite the constraints to:
In this era, the generalized Lagrangian function is:
Step Two:
According to Lagrange duality, the duality problem of primal problem is to find the minimum value of the upper-w,b, which is the derivation of them respectively.
Substituting them into the ring 1, a simpler form can be deduced, from which the liaise method can be introduced.
Note: The first step is the original problem, the second step is to accumulate and expand, the third step into the formula Ring 2 and Ring 3, the fourth step merging coefficients, the fifth step into the formula Ring 1, sixth step unfold.
Step Three:
Maximizing the parameter alpha, the special requirements that exist in the dual problem
Constraints and constraints of the maximum constraints in ring 3, we get the simplification problem after the duality,
The angle brackets in the objective function are done inside the product.
Step Four:
The Object function (Circle 4) of the above method is converted from maximal value to the minimum value, and the dual optimization problem is obtained.
Step Five:
By solving the duality problem, the w,b of the original problem is obtained.
Theorem: The solution of a dual optimization problem, there is subscript J, which makes
, and the solution of the original optimization problem can be solved by the following equation
Now summarize the solution process:
Support Vector Machine (two) linear scalable support vector machine and hard interval maximization