Reference:
Http://www.cnblogs.com/CheeseZH/p/5265959.html
Http://blog.sina.com.cn/s/blog_5eef0840010147pa.html
Http://www.cppblog.com/sunrise/archive/2012/08/06/186474.html
Keywords: max interval classifier, support vector, SVM formula derivation, Lagrange kkt condition dual problem SMO algorithm common kernel function RBF kernel function 01 loss hinge loss Soft interval relaxation variable multiple classification LIBSVM parameter
Derivation of SVM formula
Original question:
By means of Lagrangian duality (Lagrange duality), the optimal solution of the original problem is obtained by solving the dual problem (dual problem) equivalent to the original problem. The advantage of this is that the one dual problem is often easier to solve, and the two can introduce the kernel function naturally, and then generalize to the nonlinear classification problem.
Then make
The objective function becomes:
This is used to represent the optimal value of the problem and is equivalent to the original problem. If we solve directly, then we have to face W and b two parameters, but also inequality constraints, this process is difficult to do. You may wish to exchange the smallest and largest position and become:
The new problem after Exchange is the dual problem of the original problem, which is used to represent the optimal value of the new problem. And there is ≤, in the case of satisfying certain conditions, the two are equal, this time can solve the problem of duality by solving the original problem indirectly.
In other words, the original problem from the Minmax into the dual problem of maxmin, one because of the approximate solution, both, converted to dual problems, easier to solve. 3 Steps of solving dual problem
(1), first fixed, to let L about W and b to minimize, we have to w,b partial derivative, the ∂l/∂w and ∂l/∂b equals zero (the interpretation of the results of the W-derivation please see this article comments on the 45th floor reply):
The above results are put into the previous L:
Get:
Its specific derivation process is more complex, as shown in the following figure:
Finally, get:
From the last formula above, we can see that the Lagrange function at this time contains only one variable, that is, (we can find W, and B), so the core problem raised in section 1.2 above: The classification function can be easily obtained.
(2), the maximal of the pair, that is, the optimization problem of dual problem. After the first step of the above to the W and B, the resulting Lagrange function formula has no variable w,b, only. From the above equation:
In this way, we can find out the W, and then we can get the B, and finally get the separation hyperplane and the classification decision function.
(3) after obtaining the L (W, b, a) minimization of W and b, and the maximal pair, the last step can be used to solve the Lagrange multiplier in the dual problem by using the SMO algorithm.
The above equation is to solve the problem of the maximum value of W on the parameters, as well as given. Solved by SMO algorithm. See another article.
function interval Geometry interval
Define function intervals (in representations) as:
Super Plane (W, b) function interval for training dataset T: