Hard-margin's constraints are too strong: All points must be separated. This may bring overfiiting, noise also as the correct sample point.
Hard-margin some "study cleanliness", how to overcome this study cleanliness?
The idea of using the pocket algorithm, modify the form of the optimization objective function, fill in a wrong point of the penalty cσ ....
(1) The greater the C, the less tolerance for errors, the smaller the margin
(2) The smaller the C, the higher the tolerance for error, the greater the margin
So the parameter C introduced is a trade-off between large margin and noise tolerance.
But there are two drawbacks to this form:
(1) Because of the need to judge "equal and unequal", so is Np-hard solution
(2) The degree of error that cannot be distinguished from the wrong point
Therefore, the improvement of Soft-margin is introduced:
Introduce a new relaxation factor Kesi:
(1) To solve the problem of learning cleanliness
(2) can also indicate the degree of violation (Kesi is less than 1 or the point of the unit, but at this point is still in the margin and hyperplane in the middle; kesi greater than 1 means the wrong, the more to the other end of the hyperplance)
(3) can also be converted into standard QP problem, easy to solve
Next, the idea of Hard-margin dual SVM is used, and the Soft-margin SVM is primal→dual.
Because the inequality constraints become two classes, the natural introduction of two Largrange, and then the transformation of the hard-margin of thinking, the transformation into dual problem solving.
First, the derivation of the Kesi, the optimization of the objective function of the degenerate primitive:
Conclusion:
(1) The use of the Kesi derivative of 0 conditions, the beta and Kesi are removed
(2) Add a constraint to Alpha
Then the next simplification, the final form is similar to Hard-margin.
The difference is that it adds an upper bound to Alpha.
Everything seems to be going well, but after the optimizations are done, we need to solve W and B.
W Good, the crux of the problem is how to ask for B.
It is necessary to review the complementary slackness conditions in Kkt.
Here Lin gives directly, to free those SV (that satisfies kesi=0) to solve B; that is, to solve the B is helpful point, is the real SV on the margin boundary.
If the previous SVM complementary slackness content is not very skilled, this piece is easy to understand is not good: why should be divided into three categories? The following record my understanding:
(1) When alphan=0:
Kesin must equal 0 (to meet (C-alphan) kesin=0)
Heights Field Machine learning technology, "Soft-margin support Vector Machines"