SVM entry (3) linear classifier Part 2

Last Update:2018-12-06 Source: Internet

Author: User

Tags svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

SVM entry (3) linear classifier Part 2

The last time I talked about the problem of discomfort in text classification (more than one problem is called the problem of discomfort ), there is a need for an indicator to measure the quality of the solution (that is, the classification model we establish through training), and the classification interval is a good indicator.

When classifying texts, we can let Computers Look at the training samples we provide to them. Each sample is composed of a vector (that is, a vector composed of those text features) and a tag (indicating the category of the sample. As follows:

DI = (XI, Yi)

Xi is the text vector (with a high dimension), Yi is the classification mark.

In a binary linear classification, this indicates that the classification tag has only two values: 1 and-1 (used to indicate whether the classification belongs to or not ). With this notation, we can define the interval between a sample point and a super plane:

Delta I = Yi (wxi + B)

At first glance, this formula does not seem mysterious, nor can it be justified. It is just a definition. But we can see some interesting things when doing transformations.

First notice that if a sample belongs to this category, then wxi + B> 0 (remember? This is because the g (x) = wx + B We selected determines the classification by greater than 0 or less than 0), and Yi is greater than 0. If it does not belong to this category, then wxi + B <0, while Yi is smaller than 0, which means that Yi (wxi + B) is always greater than 0, and its value is equal to | wxi + B |! (That is, | G (xi) |)

Now we normalize W and B, that is, W/| w | and B/| w | replace w and B respectively, so the interval can be written

Does this formula seem familiar? That's right. This is not the formula for resolving the distance from the point XI to the line g (x) = 0! (For more information, see g (x) = 0.g (x) = 0 is the class superplane mentioned in the previous section)

Small tips: | w | what is the symbol? | W | the norm of vector W, which is a measure of vector length. We often say that the vector length actually refers to its 2-norm. The most common expression of the norm is P-norm. The following expressions can be written:

Vector W = (W1, W2, W3 ,...... Wn)

Its p-norm is

When we change P to 2, isn't it the traditional vector length? When we do not specify P, such as | w | in this case, it means that we do not care about the P value. We can use several norms; or the P value has been mentioned above, so it is not repeated for the sake of convenience.

When the normalized W and B are used to replace the original value, there is a special name for the interval, which is called the geometric interval. The geometric interval represents the point-to-superplane Euclidean distance, the Geometric Distance is "distance ". The above is the distance between a single point and a Super Plane (that is, the interval, which is no longer the difference between the two words). We can also define a set of points (that is, a group of samples) the distance from a superplane to the point closest to the superplane in this set. The following figure intuitively shows the actual meaning of geometric intervals:

H is the classification surface, while H1 and H2 are parallel to h, and the straight lines of the two samples closest to h. The distance between H1 and H, H2 and H is the geometric interval.

The reason why we are so concerned about the geometric interval is that there is a relationship between the geometric interval and the number of incorrect scores of the sample:

Delta indicates the interval between the sample set and the classification surface, R = max | Xi | I = 1 ,..., n , that is, r Yes ( xi Number of samples represented by vectors I ) the longest value of the vector length (that is, how wide the sample distribution is ). We do not need to investigate the specific definition and derivation process of the number of false scores, as long as we remember that the number of false scores represents the error of the classifier to a certain extent. As can be seen from the above formula, the upper limit of the number of mis-scores is determined by the geometric interval! (Of course, when the sample is known)

so far, we have understood why we should choose a geometric interval as an indicator to evaluate the merits of a solution. The larger the original geometric interval, The lower the upper bound of the error. Therefore, maximizing the geometric interval is the goal of the training phase. In addition, unlike what the author wrote, the maximum classification interval is not a patent of SVM , it is the idea that has existed as early as the linear classification period.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More