SVM is used to solve the problem of nonlinear classification.
Part I introduced
First, we assume that the sample is linear, "we'll get rid of that hypothesis later."
Shall we change the previous logistic back inside? The definition of the lump is slightly modified:
Make
G (z) =1 (z>=0) or-1 (z<0), or y[i]∈{1,-1}
$h _{w,b}\left (x\right) =g\left (w^{t}x+b\right) $
(Note: In the originally defined $\theta ^ {t}x $, θ is n+1 (θ[0] ....) Θ[n]), however here W is the n-dimensional vector, B is constant, W is equivalent to the original θ[1] .... Θ[n],b equivalent to θ[0])
Two important concepts are defined:
1.1 Function interval (functional Margin)
For a training sample (X[i],y[i]), define the function interval as
The function interval for the entire collection is one of the smallest
The meaning of the function interval:
Recall that the S-shaped image of the G function before learning logistic regression, when the absolute value of wx+b is very large, g→1, then we can think that it has a great possibility to belong to this category. Then if Y[i] is the same as its symbol, the collation is correct, and the function interval is positive. Otherwise the collation error is indicated and the function interval is negative.
Therefore, the size of the function interval for the entire set can be used to measure the accuracy of the collation
Note 1.1.1: For equation K wx+k b=0, changing the value of K arbitrarily does not affect the original Hyper-plane (as long as it is not 0).
1.2 Geometry interval (geometric Margin)
Previous Picture:
"About the super plane can refer to here"
The Geometry interval for sample I is:
For the entire training set, the geometry interval is:
In addition by the above two definitions can be drawn, function interval = geometrical interval *| | w| | "| | w| | is the absolute value of W
Part II Optimal Margin Classifier (optimal interval classifier)
Core idea: Choose the right W and b to make the geometry interval maximum
The formal wording is as follows:
1. "Convention | | w| | =1 (according to the above note 1.1.1,| | w| | The value can be arbitrarily retracted)
2.
3.
The result of these formulas is the optimal interval classification . It can be solved by a two-time planning method.
Part III Lagrange Multiplier method
The Lagrange multiplier method is used to find the maximum value of a function under constraint conditions. I've learned this thing in my freshman class-.-
Give some chestnuts:
Q3.1
A3.1
STEP1: As Lagrange function
(β is the parameter for the solution)
STEP2: The L is biased for each parameter, so that the partial derivative value equals 0
Then we can find out the solution.
Q3.2 This constraint has both inequalities and equations "we call it primal optimization problem, and we'll use it later.
A3.2
STEP1: For generalized Lagrange functions
(β is the parameter for the solution)
STEP2: Defining
That is, the maximum value of L, where β is a variable, and a[i]>=0
By the upper formula and constraints can be drawn:
"Proof: The existence W ' Makes GI (w ') >0, then can take a[i] for infinity, so that the value of L infinity (anyway Θp (w) is the maximum value of L)
Then the original problem (the minimum value of f (w) can be obtained if the constraint is satisfied) is converted to the minimum value of Θp (W).
STEP3: Seeking duality problem
Set the above problem (Θp (w) minimum) is the problem p*, i.e.
The top of a lump leaned first, to see another problem: make
The d here represents dual, the dual
To make this lump equal to d* is the dual problem of p*.
In combination with the previous definition of Θp (w), it can be found that the difference between d* and p* actually means that the order of the preceding Max and Min is dropped.
However there is a theorem: Always Max[min (...)] <=min[max (...)]. So
Although this does not seem to be an egg, it can be d*=p* under certain special conditions. So the solution of the duality problem d* is the solution of p*. And in most cases, the duality problem is easier to solve.
STEP4: One Foot
Set
1) both F and GI are convex functions
2) Hi is the affine function (i.e. hi (w) =ai^t (w) +bi)
3) GI is strictly executable (exists W, so that for all I, it satisfies GI (w) <0. That is , not only less than equals number is established and less than the number may also be established)
If these assumptions are true, then there is w*,α*,β*, and here w* is the solution of the original problem p* (Primal problem), and α*,β* is the solution of the duality problem d*. That
In addition, w*,α*,β* satisfies Karush-kuhn-tucker (KKT) Conditions:
"The Middle equation (5) is also known as KKT dual complementarity condition (KKT complementary condition)"
If the w*,α*,β* satisfies the kkt condition, then they are the solution of both the original problem and the dual problem. This problem is solved ~
SVM (Support vector machine)