Introduction to SVM (6) linear classifier solution-problem conversion, intuitive view

Source: Internet
Author: User

Let me repeat the problem that we want to solve: We have several sample points that belong to two categories (not limited to these points in two-dimensional space ,,

The sample points of the circle are regarded as positive samples (together with them, we can call the class to which the positive sample belongs to), and the points of the Square are regarded as negative samples. We want to obtain such a linear function (linear function in n-dimensional space ):

G (x) = wx + B

So that all vertices of the positive class X + are substituted into G (x +) ≥1, and all vertices of the negative class X-are substituted into G (x -) ≤-1 (the reason for comparison with 1 is that we fixed the interval to 1, and pay attention to the difference between the interval and the geometric interval ). If the value after g (x) is between 1 and-1, we refuse to judge.

The process of finding such g (x) is the process of finding two parameters, w (an n-dimensional vector) and B (a real number, B can be obtained after some sample points are obtained ). Therefore, when g (x) is obtained, W is the variable.

As you can see, once W is obtained (B is obtained), the straight line h in the middle will know (because it is wx + B = 0, haha ), so H1 and H2 will know (because the three are parallel, and the distance between them is determined by | w | ). So who decides W? Obviously, it is determined by the sample you gave. Once you have given the sample points in the space, the positions of the three straight lines are actually the only ones (because we are looking for the three optimal, of course, it is the only one). The process of solving the optimization problem is just to calculate the definite thing.

The sample determines W, which is described in mathematical language. W can be expressed as a combination of samples:

W = α 1x1 + α 2x2 +... + α nxn

The α I in the formula is a number (these α are calledLanshana), While Xi is a sample point, so it is a vector, and N is the total number of sample points. To facilitate the description, We will strictly distinguish between the product of numbers and vectors and the product between vectors. We will use Alpha 1x1 to represent the product of numbers and vectors, and use <X1, x2> represents the Inner Product of the vector X1 and X2 (also called dot product. Note the difference from the cross product of the vector ). Therefore, g (x) expressions must be in the following strict form:

G (x) = <w, x> + B

However, the formula above is not good enough. You can look back at the position of the sample and the negative sample in the figure. Imagine that I will not move the positions of all vertices, but just set one of the positive sample points as a negative sample point (that is, to change the shape of a point from the circle to the square). What is the result? All three straight lines must be moved (because the square and circular points must be correctly separated )! This indicates that W is related not only to the position of the sample point, but also to the type of the sample (that is, the "label" of the sample ). Therefore, the following sub-representation is complete:

W = α 1y1x1 + α 2y2x2 +... + α nynxn (Type 1)

Yi is the label of the I sample, and it is equal to 1 or-1. In fact, only a small part of the above formula is not equal to 0 (not equal to 0, which determines W ), this part of the sample points that are not equal to 0 are actually on H1 and H2, which is exactly the sample (instead of all samples) the only classification function is determined. Of course, more strictly speaking, some of these samples can be determined, because for example, to determine a straight line, you only need two points, even if three or five of them are on the top, we don't need them all. The sample points we actually need are calledSupport (Support) vector! (The name is pretty good, and they have opened a line)

You can also use the summation symbol to Abbreviation:

 

Therefore, the original g (x) expression can be written as follows:

Note that X is a variable in the formula, that is, the document you want to classify. The vector representation of this document is substituted into the position of X, and all Xi values are known samples. We also noticed that only XI and X in the formula are vectors, so some of them can be taken out from the inner product symbol. The formula of g (x) is as follows:

What have you found? W's gone! From W to α.

But some may say that this does not simplify the original problem. Hey, it's actually simplified, but after you describe the problem in this form, A large part of our optimization problems are not subject to inequality constraints (remember this is the source where we cannot solve extreme problems ). But next, let's skip the linear classifier solution to see the major improvements made by SVM on the linear classifier --Core functions.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.