SVM is used to solve the problem of nonlinear classification.

Part I introduced

First, we assume that the sample is linear, "we'll get rid of that hypothesis later."

Shall we change the previous logistic back inside? The definition of the lump is slightly modified:

Make

G (z) =1 (z>=0) or-1 (z<0), or y[i]∈{1,-1}

$h _{w,b}\left (x\right) =g\left (w^{t}x+b\right) $

(Note: In the originally defined $\theta ^ {t}x $, θ is n+1 (θ[0] ....) Θ[n]), however here W is the n-dimensional vector, B is constant, W is equivalent to the original θ[1] .... Θ[n],b equivalent to θ[0])

Two important concepts are defined:

1.1 Function interval (functional Margin)

For a training sample (X[i],y[i]), define the function interval as

The function interval for the entire collection is one of the smallest

The meaning of the function interval:

Recall that the S-shaped image of the G function before learning logistic regression, when the absolute value of wx+b is very large, g→1, then we can think that it has a great possibility to belong to this category. Then if Y[i] is the same as its symbol, the collation is correct, and the function interval is positive. Otherwise the collation error is indicated and the function interval is negative.

Therefore, the size of the function interval for the entire set can be used to measure the accuracy of the collation

Note 1.1.1: For equation K wx+k b=0, changing the value of K arbitrarily does not affect the original Hyper-plane (as long as it is not 0).

1.2 Geometry interval (geometric Margin)

Previous Picture:

"About the super plane can refer to here"

The Geometry interval for sample I is:

For the entire training set, the geometry interval is:

In addition by the above two definitions can be drawn, function interval = geometrical interval *| | w| | "| | w| | is the absolute value of W

Part II Optimal Margin Classifier (optimal interval classifier)

Core idea: Choose the right W and b to make the geometry interval maximum

The formal wording is as follows:

1. "Convention | | w| | =1 (according to the above note 1.1.1,| | w| | The value can be arbitrarily retracted)

2.

3.

The result of these formulas is the **optimal interval classification** . It can be solved by a two-time planning method.

Part III Lagrange Multiplier method

The Lagrange multiplier method is used to find the maximum value of a function under constraint conditions. I've learned this thing in my freshman class-.-

Give some chestnuts:

Q3.1

A3.1

STEP1: As Lagrange function

(β is the parameter for the solution)

STEP2: The L is biased for each parameter, so that the partial derivative value equals 0

Then we can find out the solution.

Q3.2 This constraint has both inequalities and equations "we call it primal optimization problem, and we'll use it later.

A3.2

STEP1: For generalized Lagrange functions

(β is the parameter for the solution)

STEP2: Defining

That is, the maximum value of L, where β is a variable, and a[i]>=0

By the upper formula and constraints can be drawn:

"Proof: The existence W ' Makes GI (w ') >0, then can take a[i] for infinity, so that the value of L infinity (anyway Θp (w) is the maximum value of L)

Then the original problem (the minimum value of f (w) can be obtained if the constraint is satisfied) is converted to the minimum value of Θp (W).

STEP3: Seeking duality problem

Set the above problem (Θp (w) minimum) is the problem p*, i.e.

The top of a lump leaned first, to see another problem: make

The d here represents dual, the dual

To make this lump equal to d* is the **dual problem** of p*.

In combination with the previous definition of Θp (w), it can be found that the difference between d* and p* actually means that the order of the preceding Max and Min is dropped.

However there is a theorem: Always Max[min (...)] <=min[max (...)]. So

Although this does not seem to be an egg, it can be d*=p* under certain special conditions. So the solution of the duality problem d* is the solution of p*. And in most cases, the duality problem is easier to solve.

STEP4: One Foot

Set

1) both F and GI are **convex functions**

2) Hi is the affine function (i.e. hi (w) =ai^t (w) +bi)

3) GI is strictly executable (exists W, so that for all I, it satisfies GI (w) <0. That is *, not only less than equals number* is established and *less than the number* may also be established)

If these assumptions are true, then there is w*,α*,β*, and here w* is the solution of the original problem p* (Primal problem), and α*,β* is the solution of the duality problem d*. That

In addition, w*,α*,β* satisfies Karush-kuhn-tucker (KKT) Conditions:

"The Middle equation (5) is also known as KKT dual complementarity condition (KKT complementary condition)"

If the w*,α*,β* satisfies the kkt condition, then they are the solution of **both** the original problem and the dual problem. This problem is solved ~

SVM (Support vector machine)