Learn to note the nature of the blog, at any time to find errors and omissions changes at any time.
For the Lagrange duality problem with inequality constraints, the KKT condition can be summed up as follows: Constraint conditions (constraints of primitive constraints and the introduction of Lagrange multipliers), X-biased 0, dual complementary conditions
Further, it can be understood as:
① for unconstrained variables biased to 0
② for constrained variables, the bounds of the constrained boundary can be no more than 0, not the constrained boundary bias must be 0
Where the constraint boundary is not present, the scalability of the function value is provided, making it a space instead of a point.
The dual complementary condition is the mathematical description of the ②:
Which is the original constraint.
The physical meaning of the dual complementary condition in the constrained boundary:
When it is not in the original boundary, it is "free" in all directions, if his bias is not 0 at this time, then it moves along the negative gradient in the original problem, the desirable function value becomes smaller, then it is impossible to be a solution. Therefore, when not constrained, it must be at the extreme point, i.e.: less than 0 o'clock is required to be 0
When the original boundary is equal to 0 o'clock, its movement on the boundary does not change the function value, so its bias leads to the appropriate value to further reduce the function value, that is, its bias is greater than 0 value.
It is more intuitive to understand the behavior of Kkt at the boundary by means of physical meanings.
Thus, the maximum hard interval in SVM can be intuitively understood by the physical meaning:
In the hard interval maximization problem, the function distance of inequality constrained to point is greater than or equal to 1. Points at the interval boundary (support vectors), which are equivalent to the constraint boundary, may not have a bias of 0. And at the point after the interval boundary, they are not on the constraint boundary, in order to maximize the interval, you must make the bias 0, otherwise in the negative gradient direction to find a better solution.
For SVM, because the support vector is not only the point on the interval boundary, but also the point between the interval plane, the bias of the relaxation variable is also determined by the penalty parameter, and the derivative symbol of the relaxation variable is converted to the size relation. Through analysis, it is not difficult to get a conclusion:
Reference: The method of statistical learning, Li Hang
On the physical meaning of Kkt condition in constrained boundary