SVM learning-soft interval Optimization

Source: Internet
Author: User
Tags svm

Back to SVM learning-the original form of the Rosenblatt sensor mentioned in the linear learner article. The discussion at that time was based on the linear differentiation of the input space, including the subsequent maximum interval.AlgorithmThe kernel function is used to implicitly map the input space to a high-dimensional feature space. In this case, we assume that the ing data line can be classified, naturally, what if the input space or the feature space mapped by the kernel function is still linear? At this time, the previous algorithm must not converge, right, it will not stop oscillating; by SVM Learning -- solving the quadratic programming problem mentioned in the maximum interval algorithm K-T complementary conditions, we have learned a piece of information: the support vector includes all the necessary information to reconstruct the hyperplane. Even if other points are not needed, the hyperplane can be obtained. The fewer the support vectors, the higher the generalization ability. In addition, from SVM Learning -- statistical learning theory, we can see that the good performance of the field is not explicitly related to the dimension of the feature space, from the perspective of the maximum interval algorithm itself, there is no trace of the number of support vectors. Everything is based on sample data and the algorithm is unique.Degree of FreedomIt is reflected in the choice of kernel functions, but separating data with strong cores may lead to over-fitting. As mentioned above, in fact, this algorithm is too sensitive to noise points. In other words, the maximum interval algorithm is too perfect to train data without errors. In reality, due to the noise, therefore, such an idea is not practical. This may be because the entire learner is subject to a few points, which leads to soft interval optimization. As the name suggests, it is intended to solve the previous problems, enable the trainer to tolerate noise while taking into account more samples.

First, let's take a look at another measurement method of interval distribution. It extends the concept of interval from the point closest to the superplane to all the training set data, so it can better illustrate the global characteristics of the training sample.

Definition 1: Set as the target interval, and any input sample corresponds to the hyperplane and target intervalRelaxation variableYes:

This definition is very interesting.Not mistaken scoreIs positive. Therefore, when the sampleScore mistakenIf this parameter is set to negative, it indicates that the training set is not separated by intervals and the score is incorrect. For example, if this parameter is set in the classic binary classification SVM,When the samples are divided by mistake or are not separated by intervals, the relaxation variable is not zero, but not false, and the relaxation variable is 0.,:

In the figure, all the relaxation variables except vertices A and B are 0. vertices A are not separated by intervals, and vertices B are mistakenly divided, their relaxation variables are not zero.

The preceding relaxation variables can be introduced to optimize the original maximum interval to convert the constraints:

There are two forms of the original problem: the second-order norm soft interval and the first-order norm soft interval.

1. Second-Order norm soft Interval

The original problem is:

The penalty factor is the relaxation variable.

Under the derivation formula: the original question's Laplace function is:

So there are:

{W }}= w-\ sum \ limits _ {I = 1} ^ {n} y_ I \ alpha_ I X_ I
"Src =" http://chart.apis.google.com/chart? CHT = TX & chlorophyll = % 5 cfrac % 7b % 5 cpartial % 7bl % 7D % 7D % 7b % 5 cpartial % 0d % 0a % 7bw % 7D % 7D % 3dw-% 5 csum % 5climits _ % 7bi % 3d1% 7D % 5E % 7bn % 7dy_ I % 5calpha_ I + X_ I % 0d % 0a ">

"Src =" http://chart.apis.google.com/chart? CHT = TX & chlorophyll = % 5 cfrac % 7b % 5 cpartial % 7bl % 7D % 7D % 7b % 5 cpartial % 7bb % 7D % 7D % 3d % 5 csum % 5climits _ % 7bi % 3d1% 7D % 5E % 7bn % 7dy_ I % 5calpha_ I % 0d % 0a ">

Let all the above formula be equal to 0 to obtain the following Dual target function:

To sum up, replace the inner product with the kernel function to get:

Where \ left {
\ Begin {array} {c}
0 \ quad I \ NEQ J \\
1 \ quad I = J \\
\ End {array} \ Right
}
"Src =" http://chart.apis.google.com/chart? CHT = TX & chlorophyll = % 5cdelta _ % 7bi % 2cj % 7D % 3d % 0d % 0a % 5 cleft % 7b % 0d % 0a % 5 cbegin % 7 barray % 7D % 7bc % 7D % 0d % 0a0 + % 5 cquad + % 5 cquad + % 5 cquad + I + % 5 nnq + J % 5c % 5c % 0d % 0A1 + % 5 cquad + % 5 cquad + % 5 cquad + I % 3dj % 5c % 5c % 0d % 0a % 5 Cend % 7 barray % 7D % 5 cright % 0d % 0a % 7D % 0d % 0a">

The geometric interval obtained after obtaining the optimal solution is:

We can see that the selection of C affects the final interval. The larger the value of C, the more attention it attaches to the noise point. Therefore, the greater the influence of the noise point, the greater the geometric interval obtained.

2. First-order norm soft Interval

The original problem is:

The penalty factor is the relaxation variable.

The original question's Laplace function is:

The solution is similar to the following:

{W }}= w-\ sum \ limits _ {I = 1} ^ {n} y_ I \ alpha_ I X_ I
"Src =" http://chart.apis.google.com/chart? CHT = TX & chlorophyll = % 5 cfrac % 7b % 5 cpartial % 7bl % 7D % 7D % 7b % 5 cpartial % 0d % 0a % 7bw % 7D % 7D % 3dw-% 5 csum % 5climits _ % 7bi % 3d1% 7D % 5E % 7bn % 7dy_ I % 5calpha_ I + X_ I % 0d % 0a ">

"Src =" http://chart.apis.google.com/chart? CHT = TX & chlorophyll = % 5 cfrac % 7b % 5 cpartial % 7bl % 7D % 7D % 7b % 5 cpartial % 7bb % 7D % 7D % 3d % 5 csum % 5climits _ % 7bi % 3d1% 7D % 5E % 7bn % 7dy_ I % 5calpha_ I % 0d % 0a ">

Let all the above formula be equal to 0 to obtain the following Dual target function:

Replace the inner product with the kernel function to get the result:

MAX \ quad W (\ alpha) = \ sum \ limits _ {I = 1} ^ {n} \ alpha-\ frac {1} {2} \ sum \ limits _ {I, j = 1} ^ {n} {y_iy_j \ alpha_ I \ alpha_j (K (x_ I, X_j) "src =" http://chart.apis.google.com/chart? CHT = TX & chlorophyll = % 0d % 0 amax + % 5 cquad + % 5 cquad + W (% 5 calpha) % 3d % 5 csum % 5climits _ % 7bi % 3d1% 7D % 5E % 7bn % 7D % 5calpha-% 5 cfrac % 7B1% 7D % 7b2% 7D % 5 csum % 5climits _ % 7bi % 2cj % 3d1% 7D % 5E % 7bn % 7D % 7by_iy_j % 5calpha_ I % 5calpha_j (K (x_ I % 2cx_j )) ">

The geometric interval obtained after obtaining the optimal solution is:

I dizzy, and the previous maximum interval algorithm in the dual form, only one more constraint, that is, at this time the K-T complementary condition is:

The following information can be obtained from the formula above: At that time, the relaxation variable is not zero, and its geometric interval is less than, and the corresponding sample point is a false split point. At that time, the relaxation variable is zero, and its geometric interval is greater, the corresponding sample point is the internal point, that is, the sample points that are correctly classified and far away from the hyperplane of the maximum interval classification. When the relaxation variable is zero, the geometric interval is equal to and the corresponding sample point is the support vector. The value must be yes, which means that the vector is restricted to a box with a side length of C. It's amazing, hey.

Whether the first-order norm soft interval is used or the second-order norm soft interval depends on data conditions.

The following figure shows the function of Penalty Factor C. The tool libsvm is used and the linear kernel function is used. The default value is other parameters except the penalty factor:

1. Original data distribution:

We can see that four pink dots are out of the way. We can basically think of them as noise points.

2. The penalty factor value is

 

2. The penalty factor value is

2. The penalty factor value is

Which of the following figures illustrate the problem? The larger the value of C, the more you don't want to discard the outlier. The more the classification hyperplane moves to the outlier, the more likely the classification hyper-plane is to be controlled by the outlier.

about libsvm, it should be version 3.0 now, can be from http://www.csie.ntu.edu.tw /~ The download libsvm section of cjlin/libsvm/index.html has two versions: Windows and Linux. The source code is included in the section, take a good look.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.