**This article affirms** **:** **This article original** **,** **if reproduced please indicate the original source** **. **

Foreword: The last one we talked about logistic regression, today we are talking about the SVM algorithm that is similar to that of course, the discussion of the problem is still discussed on the basis of the linear fractal.

Many people say that SVM is the best classifier at the moment, so let's take a look at where our SVM is.

One: initial knowledge of SVM

Problem: Separate the ball and the pentagram with a straight line.

Solution: There are n kinds of methods, such as:

Additional question: Find the best classification?

Answer:

Exe me ghost know which one is the best??

Wait, this best classification is not equivalent to, the landlord let the housekeeper to two sons division, is not as long as the two between the same as many can be? Is that the red line that makes two of the distance and farthest from the boundary?

Congratulations, you guessed it.

Now we're going to raise the problem to n-dimensional, which means I don't know what this n-dimensional looks like, but I want to separate the two from N dimension. Then from the straight line ax+b can be expressed as a super-plane wtx+b to separate. As follows:

Among them **w=**(W1; W2; WD) is the normal vector, which determines the direction of the hyper-plane, and B is the displacement term, which determines the distance between the super plane and the origin point. Obviously, this hyper-plane is determined by W and B. Then find out this super plane this classification model is not built. That's the problem of optimizing the maximum interval.

Speaking of the interval (margin) problem, we come to the popular science under the function interval and the geometrical interval.

1.1 Function interval (functional margin)

Assuming that the a,b,c represents three instances, a is closest to the hyper-plane, then the probability of predicting the right is relatively low, but C is farthest from the ultra-planar h, so the probability of predicting the correct is higher and B is in between.

In general, a distance from a point to a plane can be expressed as the degree of certainty of this point prediction. This is the function interval.

For a given training dataset T and a hyper-plane (w,b), the function interval defining the hyper-plane about the sample point (X_i,y_i) is:

1.2 Geometrical intervals

The function interval can indicate the correctness of the prediction and the degree of certainty, but when we choose the optimal super-plane, only the function interval is not enough, the main scale changes W and b so the super-plane does not change, but the function interval has become the original twice times, we can separate the plane of the normal vector, such as normalization, | | w| | =1 so that the interval is determined so that the function interval becomes the geometric interval.

Suppose B is on this split line surface, dividing the line outside the point, to the distance of the plane with a projection, any point in this super-plane split plane has a project, now assume that a projection is B, then BA is a to B distance, W is its normal vector, where w/| | w| | is a unit vector **,** assuming a**(Xi,yi****)** then B's x coordinate is **xi- ** **w/| | w| |** **. **bring the horizontal axis of B into the wtx+b=0 and get:

Further simplification:

So when | | w| | =1 time, no chaos enlarged several times, no influence on the distance, this is called the geometric distance.

1.3 Maximum Interval

Now we know that the separation of the super plane has countless, but the geometric interval maximization only one, and the idea of SVM is to find this can maximize the interval of the super-plane, this as a classification of the standard. Then this problem can be expressed as constrained optimization problems:

S.T.

|| w| | =1

Here with | | w| | =1 the statute W, so that + is the geometric interval, taking into account the relationship between the function interval and the geometric interval:

So the above formula can be rewritten as:

S.T.

To find the maximum value above, it is equivalent to seeking | | w| | The minimum value problem. Equivalent to:

Min (W,B)

S.T.

Well, at last it's a step.

The next is the manual solution method, a better solution.

Two: The dual algorithm of learning

In order to solve the optimization problem of the linear-divided support vector machine, it is used as the original optimization problem, the Lagrange duality is applied to solve the duality problem, which is the dual algorithm of the linear variational support vector machine. Thought:

I think you'll understand when you look at it ...

Supplement (from Wikipedia):

First, we define Lagrange functions and introduce Lagrange multiplier ai>=0,i=1,2,3..,n to the constraints of each inequality. The function 1 is:

。

According to Lagrange duality, the duality problem of primal problem is the maximal minimization problem:

So, in order to get the solution of duality problem, we need to find the minimum of L (w,b,a) and then ask for the great of a.

2.1 min L (w,b,a)

First, the function 1 is biased to the w,b and made to 0 respectively.

So:

The above two are brought into function 1 to get:

That

2.2 min L (w,b,a) to A's great, even if duality problem

。

In this way, we can get the equivalent dual optimization problem:

。

Now the solution for a in the upper formula is a*= (a1*, a2*,... an*) T, because the KKT condition can be obtained so that the solution of the optimization problem w*,b*.

In conclusion, categorical decision functions can be written as:

Next announcement: The next period will bring you a linear non-point, but also the core of the SVM "nuclear weapons" and the advantages and disadvantages of SVM, please pay attention!

Machine learning note-SVM algorithm (top)