Machine Learning Cornerstone-Learning note 02--hard Dual SVM

Source: Internet
Author: User
Tags svm

Background

The last article summarizes the linear hard SVM, the solution is straightforward, directly from the definition of SVM, through the equivalent transformation, into the QP problem solution. In this case, it is not so straightforward to describe the hard SVM solution from another angle, but it avoids the calculation of the data in the feature conversion, so that some finer solutions can be obtained by using some feature transformations of high latitude (even infinite dimensions).

?

Lagrangian multiplicative sub-type

First, review the definition of the SVM problem as follows:

Linear constraints are annoying, inconvenient optimization, whether there is a way to put the linear constraints on the optimization problem itself, so that can be freely optimized, without regard to linear constraints. LaGrand God provides a way to achieve this, called Lagrange multiplier (more general method reference article "Simple explanation Lagrange duality"), the form is as follows,

This formula is not very strange, unprovoked more than n variables, but look at the following changes, we know this Lagrange multiplier of the bad.

?

What, the SVM question equals the right min Max? Yes, although it is not scientifically felt at first glance, but careful analysis, it is true. First, because, to make f (w,b) =,

?

When f (w,b) > 0, in the case of W,b fixation, Max will zoom in;

When F (w,b) 0, then, then.

Therefore, in two cases, the SVM problem is equivalent to the Min Max transformation formula. is not very wonderful, have to admire LaGrand day God.

?

?

Dual transformation

In the above question the form of Min Max is inconvenient to solve, but it can be changed to derive the form of Max Min, so that the min is calculated from the inside out, and then the outside Max is computed. This change is called duality change.

Select any one fixed first, and then there is

On both sides, by W,b, the equation is still set, i.e.

There are multiple choices, but the above inequalities are consistent, so in the multitude of choices a maximum, the above equation is deformed,

?

In this way, Min Max establishes a certain connection with Max Min, but because it is "", it is called the weak duality (week duality). How can the strong duality (strong duality) be established? The following conditions need to be met,

    • The original problem is the convex problem
    • The original problem can be linearly divided
    • Linear constraint conditions

Too bridge, SVM problem fully conforms to the above constraints, so it is dual, so you can solve the problem of Max Min on the right to get the final Solution!

?

Problem simplification

After the duality changes above, the following steps simplify our original question,

The first derivative of B, and 0, has the following results

Bringing this result into the company above, simplifying

?

Then, for the partial derivative of W,

?

?

So, the vector w is

Bring w in and remove Min to get the following

Execution here, now the objective function is only related, form satisfies QP, can be easily obtained, that is, to get W. However, in the calculation process, we need to calculate an intermediate matrix Q, the dimension is N*n, this is the main computational overhead. There is no need to calculate this Q in the previous lecture, because the Q form is very simple.

?

The question is, how to solve B, the above objective function, in the previous simplification process eliminated B, has nothing to do with B.

?

Calculation b

Now there is only one last question, how to calculate B? In the previous simplification process, B was eliminated, and the above-mentioned QP solution was irrelevant.

?

KKT conditions help us to solve this problem. If both the original problem and the dual problem have the optimal solution and the solution is the same, then the KKT condition is satisfied. This is also a sufficient condition, one of the most important is the complementary slackness (complementary relaxation), which has the following equation set up,

Because (duality), and (original condition), then one of the two is not 0 o'clock, the other one must be 0. So, as long as you find one, you can use this property to calculate the B, the calculation method is as follows:

Multiply on both sides,

Theoretically, only one point is required, but in fact, due to the error of calculation, all B may be calculated and averaged. And these can calculate the point of B, is the support vector, because they meet the original problem of the support vector definition. However, not all support vectors have corresponding. In general, the vectors we use are called support vectors, and those that meet the support vector definition are called candidate support vectors and have the following relationship

And, in order to simplify the calculation, in the calculation of W, the calculation can be omitted, as follows

The philosophy of W

by the above calculation, the last W is actually a linear combination of (). Similarly, the W in PLA is also a linear combination of (). Only SVM uses support vectors to solve this linear combination, and PLA is using the wrong vectors. Similarly, logistic regression, linear regression also has a similar rule, called this phenomenon "W represented by data".

Summarize

This section uses the duality problem to solve the SVM from the other side, and the data derivation is relatively complex, and the computational complexity is increased because of the need to solve a n*n-dimensional matrix Q. But why do these things, hard linear SVM to be a lot simpler? In fact, the dual problem is to use kernel to do the groundwork, change the kernel can be the dimension conversion calculation omitted, so that can calculate a very complex transformation, which is discussed in the next section.

?

?

PS: Underlined part of the paragraph is due to include the formula, not normal display, so the picture is shown below.

Machine Learning Cornerstone-Learning note 02--hard Dual SVM

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.