"Machine Learning Foundation" soft interval support vector machine

Last Update:2015-04-23 Source: Internet

Author: User

Tags svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction

In the previous section, we introduced the kernel support vector machine. So, whether it's a simple question or a complex problem, we can do it.
However, methods such as the Gaussian kernel are too complex and may cause problems with overfitting. The cause of overfitting may be that you choose a feature transformation that is too powerful to control the complexity of the model with the maximum interval, and one reason is that if you persist in classifying all the data correctly, it is possible to consider the noise in the model construction. This will fit the noise into your results.

Soft interval support vector machine The first step: tolerance of the occurrence of errors

This is the basic definition of the hard interval support vector machine (Hard-margin SVM), which requires all the data to be sorted correctly, and also to choose the smallest case of W:

In order to alleviate this problem, we should tolerate some mistakes and add the wrong situation to the target formula, hoping to get the result that the less the wrong classification situation is, the better. Where parameter C represents the relative importance of the interval and the error condition (that is, C is the tradeoff and tradeoff between the maximum interval [large margin] and the noise tolerance [noise tolerance]), c relatively large time to represent the less mistakes the better, C relatively small time to find the W shorter the better , that is, the greater the spacing of the data on the pair, the better.

There is a disadvantage to this equation, because there is an operation to judge the right and wrong (booling problem), so this is not a linear equation, so it can not be solved with two times planning method. And this method is not able to measure the size of the error.
To solve these problems, we need to come up with a new formula that distinguishes between small errors and blunders, and it can continue to be solved using two-time programming methods.

Step two: Measurement of errors

The ξn in the following equation is used to measure how much data is being made wrong, not the data that makes a few mistakes. And such a change, the ξ as a variable added to the formula, to get the ξ, so that can be used two times planning method to solve.

We use ξ to record the distance from the boundary of the offending data and to incorporate this distance into the optimization criteria.

Parameter C controls how much of a boundary we want or whether it is better to care less about violating the boundaries.
The larger the C stands for, the narrower the border, the less data we will violate.
The small c represents that we want to get a wider border.

Step three: Derivation of duality

What we are going to do next is to deduce the duality problem of the above problem, and get the duality problem can easily introduce the kernel technique to do the feature conversion easily (feature transform).
The original question is this:

Since it is now two conditions, we use αn and βn to construct Lagrangian functions:

After the Lagrange function is obtained, we can convert it to duality problem and simplify it by kkt condition.
We want to simplify the ξn by asking partial differential for ξn:

So that we can use αn to represent βn, but because of βn>0, so αn also have a restriction condition:

So we can remove the ξn and βn two variables and get the following formula:

The above formula is similar to the dual form of the hard interval support vector machine, so we can get partial differential for B and W and obtain almost the same answer as the standard support vector machine dual problem:

This is the dual form of the standard soft-interval support vector machine:

Unlike hard-interval support vector machines, the αn here have a limit of C.

Nuclear soft interval Support vector machine

Next, we get the algorithm steps of the kernel soft interval support vector machine:

The difference between this and the previous hard interval support vector machine is that the process of asking for B is not the same. Now because it is a new two-time planning problem, its kkt conditions have changed.

Ask B

We compare the difference between the hard interval SVM and the soft interval SVM:

From the point of view, when we find the B parameter of the hard interval SVM, we use the complementary relaxation characteristic in the KKT condition (complementary slackness), multiply it by two to 0, and when the αn>0 (i.e. corresponding support vector), the expression of B is obtained.
Now, we are observing the complementary relaxation characteristics of the soft interval SVM.
According to the above two formulas, we make αs not equal to C, which makes ξs=0. So we can get a B formula.

As explained above, we know that this requires B, not to find an arbitrary support vector to calculate B, as in the case of hard interval SVM, but instead to find a free support vector, which computes B.

Different C-corresponding effects

The above image is a different classification effect using Gaussian kernel soft interval SVM, but different C values.
When C=1, the blue area represents the area of the circle, the red area represents the area of the Red fork, and the gray area represents the boundary part. Here we see a situation where there are errors.
When c=10, we control by using the larger C, we want the classification to minimize errors, we find that the gray boundary area narrows, but the wrong data is reduced.
When c=100, there are fewer data errors, but there are problems with fitting.
This tells us to avoid overfitting problems by selecting parameters.

The physical meaning of αn

We can divide the data into three cases based on the two formulas of the previous complementary relaxation characteristics:

For the case of 0 <αn < C, ξn=0, here corresponds to the support vector data on the boundary
For αn=0,ξn=0, this is not a violation of the boundary, the data is correctly classified, these points outside the boundary
In the case of Αn=c, this is a violation of the boundary data, in the figure with triangular annotated data, we use Ξn to record the distance of the data to the midline of the boundary (there is a relatively small violation of the data, that is, although within the boundary, but not by mistake classification; the other is the wrong classification of data, is a relatively large violation)

The different situations of αn here are very helpful for us to analyze the data, and its different values can show what the data represented by the boundary is.

Model selection by cross-examination to select model parameters

The above figure shows the different boundary representations of the Gaussian kernel SVM in different C and different gamma, where the horizontal axis corresponds to different C, while the longitudinal axes represent different gamma.

The above diagram shows the use of cross-validation method we choose the least error of the model parameter, we can only select a few different C and γ, compare which parameter combination of the form is better.

Relationship between SVM and support vectors with a cross-validation error

One of the interesting relationships in SVM is that the error of leaving a cross-validation is less than or equal to the scale of the support vector in all data.
If the non-support vector data is removed, this has no effect on finding the optimal solution for the constructed boundary, so the contribution of the non-support vector to the error of leaving a cross-validation must be 0, while the contribution of the support vector to the error of leaving a cross-validation may be 0 or 1. So we got this limiting relationship.

We can also choose the model parameters according to the number of support vectors, but it is worth noting that the number of support vectors can only be used to estimate an upper limit, but it is not necessarily the best, so this relationship is usually used to exclude some of the obvious bad model parameters.

reprint Please indicate the author Jason Ding and its provenance
Gitcafe Blog Home page (http://jasonding1354.gitcafe.io/)
GitHub Blog Home page (http://jasonding1354.github.io/)
CSDN Blog (http://blog.csdn.net/jasonding1354)
Jane Book homepage (http://www.jianshu.com/users/2bd9b48f6ea8/latest_articles)
Baidu Search jasonding1354 access to my blog homepage

"Machine Learning Foundation" soft interval support vector machine

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More