"Reprint" Support Vector Machine (four)

Source: Internet
Author: User

Support Vector Machine (four)

9 regular and non-split (regularization and the non-separable case)

The case we discussed earlier is based on the linear separable assumptions of the sample, and when the sample is linearly non-tick, we can try to use kernel functions to map features to high dimensions, which is likely to be separable. However, after mapping we can not be 100% guaranteed to be divided. What to do, we need to adjust the model to ensure that in the case of non-point, we can also find the separation of the super-plane as far as possible.

Look at the following two pictures:

You can see that an outlier (possibly noise) can cause a hyper-planar movement, narrowing the gap, and that the previous model is very sensitive to noise. What is more, if the outliers are in another class, then this is the linear non-point.

At this point we should allow some points to wander and violate the constraints in the model (the function interval is greater than 1). We design to get the new model as follows (also called soft interval):

When nonnegative parameters are introduced (called relaxation variables), the function interval of some sample points is allowed to be less than 1, that is, within the maximum interval, or the function interval is negative, that is, the sample point is in the other's area. After relaxing the constraints, we need to re-adjust the target function to punish outliers, and the target function is followed by the more outliers, the greater the value of the target function, and we are asking for the smallest possible value of the target functions. Here c is the weight of outliers, the greater the C, the greater the impact of outliers on the target function, that is, the more you do not want to see outliers. We see that the objective function controls the number and extent of outliers, so that most sample points still adhere to the constraints.

After modifying the model, the Lagrange formula should also be modified as follows:

Here are the Lagrange multipliers, recall that we mentioned in the Lagrange duality of the method, first write the Lagrange formula (above), and then treat it as a variable W and b function, respectively, to the partial derivative, to obtain the expression of W and B. Then, in the formula, the maximum value of the formula is obtained after the entry. The entire derivation process is similar to the previous model, where only the final results are as follows:

At this point, we find that there are no parameters, and the only difference from the previous model is the more restrictive conditions. It should be recalled that the B's evaluation formula has also changed, and the results of the change are described in the SMO algorithm. First look at the changes in kkt conditions:

The first equation indicates that the coefficient at the front of the sample point outside the two interval line is 0, the coefficient in front of the outlier sample point is C, and the support vector (that is, the maximum spacer line on either side of the superelevation plane) is preceded by a factor of (0,C). The KKT condition shows that some sample points on the maximum interval line are not support vectors, but may also be outliers.

10 coordinate ascent Method (coordinate ascent)

Before the final discussion, let's look at the basic principles of the coordinate ascent method. Suppose that the following optimization problem is required:

Here W is the function of the vector. Before we mentioned two methods of finding the optimal solution in the regression, one is the gradient descent method and the other is Newton's method. Now let's talk about a method called the coordinate ascent method (which is called the coordinate descent method when solving the minimum value problem, the same principle).

Method procedure:

The meaning of the innermost statement is to fix all but the other, when W can be regarded as just the function, then the direct derivative optimization can be. Here we are maximizing the derivation of the order I is from 1 to M, you can change the order of optimization to enable W to increase and converge faster. If W is able to achieve the best in the inner loop, then the coordinate rising method is a very efficient method to find the extremum.

Here's a picture to show:

The ellipse represents the contours of each of the two functions, the number of variables is 2, and the starting coordinate is (2,-2). The path of the straight-line iterative optimization in the figure, you can see that each step will be further ahead of the optimal value, and that the forward route is parallel to the axis, because each step only optimizes one variable.

"Reprint" Support Vector Machine (four)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.