Linear support vector machines.
In this classification problem, we need to choose the most "fat" line, and the fattest line is the margin largest line.
Our optimization goal is to maximize this margin. That is, the distance from each point to the line is minimized. How is this distance calculated?
In order to not be confused, W0 will not be the synthesis vector w, in addition to take a new name B. Similarly, X0=1 does not have to be stuffed into the x vector.
So the result: h (x) =sign (wtx+b), while the distance distance (x,b,w) satisfies wTx ' +b=0
X ' and X ' are two points on the plane, (x ' –x ') is a vector on the plane, if the vector on this plane and WT multiply the result is 0, then it can be concluded that WT is perpendicular to this plane, WT is the normal vector of this plane.
This allows you to push the formula for the distance:
This line also has to be two yuan for Y to classify, that is, for each point (Xn,yn), this line can reach Wtxn+b and yn the same number. Use the character of the same sign to remove the absolute value symbol.
In combination with the previous formula, our optimization goal is to:
Margin (b,w) indicates that the point-to-line distance is determined by B and W, and our goal is to maximize it, that is, to find the "fattest" B and w from the point-to-line distance;
Every yn (wtxn+b) >0 indicates that each point (Xn,yn) can have a line separating them;
With some scaling techniques, the value of yn (wtxn+b) is 1, so the margin (b,w) is simplified.
The margin is then simplified into:
The restrictions were relaxed by yn (wtxn+b) =1 into yn (wtxn+b) ≥1, proving that our solution would not have any effect even if the conditions were relaxed.
Our ultimate goal is to optimize for:
The maximization of a target can be removed from the reciprocal, replaced by the smallest target, for the convenience of the calculation, and then multiply by a 1/2. This problem is called the standard problem.
First take a few specific examples to see the situation:
After drawing:
The point with the block in the figure is the support vector. Only they can decide what the line looks like, and none of the other points matter.
So, what if the generalize question should be solved?
Use quadratic programming to solve!
Additional knowledge: https://en.wikipedia.org/wiki/Quadratic_programming
Quadratic programming (QP) is a special type of mathematical optimization problem--specifically, the problem of optimizing (either minimzing or maximizing) a quadratic function of several variables subject to linear constraints on these variabl Es.
Problem formulation:
The quadratic programming problem with n variables and m constraints can formulated as follows. Given:
A real-valued, n-dimension vector c;
An nxn-dimensional real symmetric matrix Q;
An mxn-dimensional real matrix A;
An m-dimensional real vector b,
The objective of quadratic programming is to find a n-dimensional vector x, that would
Minimize 1/2*Q*XTX + cTx, subject to Ax≤b
The notation ax≤b means that every entry of the vector Ax are less than or equal to the corresponding entry of the vector b .
To change the equation we want to solve into a standard two-time programming problem style, the QP program will give me an answer.
So summarize the solution process:
Hard-margin means that all points are strictly separated. Without this strict line, there is no solution.
If you raise the line dimension to a hyperplane:
SVM can be seen as a regularization explanation:
On the other hand, SVM by adjusting "fat thin" size, in fact, can indirectly do VC dimension adjustment.
We use SVM to control the complexity of the model when we raise the dimension of the linearly irreducible data to a linear fractal dimension by φ.
Summarize:
Machine learning Techniques (1)--linear support Vector machines