This time the topic is SVM, generally also can, dual there later add

Source: Internet
Author: User
Tags min square root svm
really do not want latex to knock formula, I wrote on the white Paper on the photo, make a look at it ouch, multi-image warning, word ugly warning (10,000 years not to write).

We all know that SVM is looking for the maximum interval, why. I have not studied the risk theory, so to speak, the greater the interval, the lower the upper limit of the probability of classification error, the more intuitive, the stronger the classifier robustness. The upper limit seems to be inversely proportional to the maximum interval, and the formula I forget ...

How to find the maximum interval, which sounds like an optimization, satisfies the classification of the correct (constrained conditions s.t.) The interval maximization (max)

Question 1, everyone is accustomed to writing intervals for 2/| | w| |, split line for Wx+b=0, two lines for wx+b=1 and wx+b=-1, in fact, I wondered, the reason you write a wx+b=r, a writing wx+b=-r, so I also believe, why is 1.
is actually equivalent, as the picture can always be converted into a =1 situation.

An understanding of the most optimal form
The constraint conditions are obvious and the classification is correct;
Objective function, the maximum classification interval.
The original objective function has the square root, is inconvenient to solve in the denominator, the transformation form, obtains the equivalent optimization form
(Alternative understanding, because the classification is correct, loss function penalty part = 0, remaining L2 regular)

Here, the objective function convex function, constrained convex, convex optimization
The Lagrange multiplier (KKT) is used to transform the even function solution.
Why do this because the even function is easy to solve
(1) The original Optimization->minmax optimization, which is the Lagrange multiplier to constrain the way, want to add the most optimized I posted a link, http://www.cnblogs.com/90zeng/p/Lagrange_ Duality.html, here to add a bit here "conversion equivalence" in the details of the interpretation, if W, b to meet the constraints, the original min optimization and the new Minmax optimization of the same solution; If the constraints are not satisfied, both can be considered as non-solution (in other words, the Minmax has a solution, then min optimizes the same solution, If Minmax no solution, then the original optimization does not meet the constraints, but also no solution)
(2) Minmax->maxmin optimization, which is the same solution to even function as the convex function. The weak duality is an inequality relation, and the weak duality is promoted to strong duality in the condition of the + convex function, where the inequality can be taken as an equal sign.

Further solve

Constraints now have only one α>0, and there is no constraint in the inner layer, the direct derivation (necessary condition), convex function (or two times) to upgrade to the necessary and sufficient conditions, after the solution to eliminate W, b to get α two-time programming problem.

Here is not a rush to solve, first look at the previous question

The prerequisite for optimization of the original optimization, the KKT condition, in which an item directly named the mathematical origin of the support vector
Popular saying: When you draw the middle interval of the straight line (super plane), the impact of your two areas (positive and negative) boundary points, many points to you have no effect;
In other words, in the process of solving this optimal interval, many points of information are not used

Then the above solution

In Zhou Zhihua's book, I saw an SMO algorithm to solve that α two-time plan, which is said to be faster than the regular two-time plan;
Each time a variable is optimal, so iterative until convergence, it sounds like could there be is the coordinate drop.

In addition, for soft interval, all of a sudden feel before all derivation is a special case, soft interval is more generalized situation, loss function= punishment +L2 regular

-Kernel function kernel

Popularly said, such as two concentric circles, the great circle is the class, the small circle is the inverse class, in two-dimensional plane let you find a straight line (linear classifier, maximum interval) will effectively separate the two, difficult; but you can find in a function z=x1^2+x2^2, in three-dimensional space, a great circle on top, a small circle below, You can separate it with a plane.

This period on to here, ouch, eat bad belly, diarrhea, instantly feel people on the empty, say some of the most optimized things, I am not very 666, please professional advice.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.