First, the concept of large_margin separating hyperplane is introduced.
(Under the premise of linear separable) find the Largest-margin interface, the fattest dividing line. Here's how to find largest-margin separating hyperplane, which starts step-by-step.
Next, Lin deliberately emphasized the variable representation of the change in the symbol, the original W0 replaced by B (so that the expression is conducive to derivation; think this kind of emphasis is very responsible, conducive to students understand, otherwise the symbol in exchange for, who knows what you say is what)
Since the goal is to find Larger-margin separable hyperplane, you have to figure out what the distance from a point to a plane is.
Suppose X is a point on a plane and X is a point outside the plane, then the X-to-plane distance can be projected onto the planar normal vector w using x-x '.
With the point-to-hyperplane expression, we can express the conditions of the larger-margin condition that we need to solve.
Since the assumption is that the data is linear separable, there is yn (W ' xn+b) >0.
At this time see subject to part:
(1) Every that ensures that the super plane must be separated.
(2) The margin clause contains two meanings:
A. What is margin? is the distance from the nearest point of Hyperplane to the hyperplane; each hyperplane of a given condition corresponds to a margin (b,w)
B. In other words: one (b, W) corresponds to a hyperplane→ if hyperplane satisfies yn (w ' xn+b) >0 (n=1,..., n) → corresponds to a margin (b,w)
In this way, it is clear: for a separable hyperplane, the margin (b,w) is a process of min , and for all separable hyperplane generated margins, The largest margin in all margins is a process of seeking Max .
Up to the top, Larger-margin separable Hyperplane's questions are listed clearly. The beginning of this is how to simplify the constraints and solve the target continuously.
Here, we start with the expression of margin, forcing you to specify min yn (W ' xn+b) =1 (n=1,..., N), so that the margin expression becomes 1/| | w| | This form.
Some of the information I have read about this is explained by the b,w size, which makes the point on the margin yn (W ' xn+b) = 1.
The above statement is more intuitive, from a geometrical point of view to define such a simplification method; but this one-off explanation doesn't make me completely convinced, so here's a formula to explain why you can do the above substitution.
For a hyperplane (b,w), margin is min (1/| | w| |) * (yn (w ' xn+b)), if directly let yn (w ' xn+b) = 1, by what say 1/| | w| | Or is it a margin?
You can think of it this way:
(1) Assume that for a hyperplane (B,W), the value of margin is V (determined value) →
(2) min (1/| | w| |) * (yn (W ' xn+b)) = V (n=1,..., n) →
(3) Min 1/(| | w| | /s) * (yn (W ' xn/s+b/s)) = V (n=1,..., n) →
(4) Make yn (W ' xn/s+b/s) = 1, then Wnew = w/s , bnew = b/s →
(5) There is min (1/| | wnew| |) * (yn (wnew ' xn+bnew)) = V (n=1,..., n) →
(6) 1/| | wnew| | = V
Through the above 6 steps, so that min yn (W ' xn+b) =1 The rationale behind the clear. Further simplifying the constraint conditions, as follows:
The above uses a contradiction to show that this substitution is feasible.
Next, we use an example to elicit the concept of the support vector.
These vectors for hyperplane are called support vectors.
The following content, let me refreshing.
To the top, suddenly there is no mess of things, Lin directly said that this is a typical quadratic programming (QP) problem;
Typical features: The most optimized expression is two times, that is, the problem is a conventional routine to solve.
How to follow the regular routine of QP to engage? Just sort out a few parameters and it's OK. It seems a little silly to see here: What about the kkt stuff? You're not talking about it?
There is no need to think about it, both QP directly solved, but also what kkt AH.
Next explains the reason behind the Large-margin hyperplane.
Large-margin control the complexity of the model and control the VC dimension.
In short, the benefit of large-margin is that the complexity of the model can be controlled in the case of Non-linear transform.
"Linear support Vector machines" heights field machine learning techniques