In the previous article, Support Vector Machine (1): The decision boundary of linear diversity, we finally got the problem of finding the best margin for SVM, which was transformed into the following form:
After this step, I personally took a long time to consult the information, because the reason for poor math, understanding is quite slow, but the fun of exploration is to constantly break the bottleneck forward, OK continue. The above problem is equivalent to:
Then we introduce the generalized Lagrangian function and use Lagrange duality to solve this problem. First of all, we do this work is to eliminate constraints, in order to solve the problem. The generalized Lagrangian functions are:
The LaGrand is divided into two parts, the idea of the previous generation is, let the latter to reach the maximum value, and then fixed, then the problem is equivalent to the minimum value of the previous item. Since we want to eliminate the condition (the second item), it is necessary to prove that the condition is useless. In other words, whether given or not, this condition will be set up, it is necessary to abandon! So, we consider that there are only two possibilities in this world:
First, we consider the situation of <1, that is, the condition of the original problem is not satisfied, then the second term of the generalized Lagrangian function, its maximum value will tend to infinity, then we can not find the minimum value of the formula, this condition will be discarded! In the case of greater than or equal to one, the second item of the function tends to be 0, that is, when we ask for the minimum value of the entire formula, we naturally choose the side that satisfies the condition, thus transforming the original problem into:
Then we consider the next question, duality. Just now, let's focus on the second item, maximize it, and then minimize the first item to transform the original problem. So, if we look at this formula again: If Alpha is treated as a constant (alpha '), and then W and b are used as variables to minimize the function, the second item is less than or equal to the formula we deduced the first time. Why, because we just made the second item max, and now we're taking alpha ', Max>=any. Also namely:
Now that we've determined W and B, we're going to transform Alpha ' to the maximum value, we know:
Thus the Lagrangian duality problem (Lagrange Dual problem) is obtained.
Under general conditions,≤, but in some special cases, the two are equivalent, the case is called strong duality. The optimal boundary of SVM is solved by the KKT (Karush–kuhn–tucker conditions) condition under strong duality. The KKT condition is as follows, the variable name here is not the same as ours, in order one by one corresponds to:
Solution steps:
1, fixed the alpha, to the W and b respectively to seek partial derivative, so that it equals 0:
Bring back the previous L, get:
The problem translates to:
After that, the equation constraint is brought back to the formula, the extremum is obtained for each alpha-biased derivative, and then brought back to W and B to get the decision boundary.
--------------------------------------------------------------------------------------------------------------- ------------------------
Declare several points:
1, special thanks to this article: simple explanation Lagrange Dual (Lagrange duality), especially like this approachable mathematical explanation.
2, support vector machine Popular Introduction to write quite detailed, the individual read not less than 20 times, still continue to study.
3, Hangyuan Li Teacher's "statistical learning method", also speak very thorough.
I am a person who sees the word "Lagrange" as a fear, and needs to humbly the wisdom and sharing spirit of his predecessors.
Support Vector Machine (2): Solving the optimal boundary of linearly-scalable SVM