(1) Basic form of support vectors
For a classification problem, if you use the PLA algorithm, there may be multiple classification strategies, as shown, it is obvious that the third diagram is the best classification strategy, because in the third figure, if the data on the boundary allows the measurement error to be greater. The generalization of the unseen example is stronger. This method is the support vector machine.
What we want to get is to find a straight line that separates the sample data correctly, and the distance between the line and the nearest sample data is the largest. For a straight line, if using w= (W1,W2,W3,..., WN) to represent the normal vector of the super-plane, the direction of the super-plane is determined, the super-plane can be described by the following linear equation:
Wtx+b=0
The distance from the sample to the dividing plane is: r=|wtx+b|/| | w| |
So the dividing plane with the largest interval is
can be converted into the following form by equivalent conversion:
This is the basic form of support vector machines.
(2) Solving process of basic support vectors
Support Vector machine using two-time programming method solution:
In fact, it is very simple to imitate the right-hand style bar to the left, as long as the corresponding coefficients corresponding can, at this time, you can find the final results, find W and B to get the final result.
(3) Dual support vector machine (Dual SVM)
Duality is a basic support vector machine solution, that is, to put the linear constraints on the optimization problem itself, the method used is Lagrange solution, the specific solution process is as follows,
-
Convert the form of the original problem into a Lagrangian form as shown in. The next step is to solve the w,b value so that the value of L (b,w,a) is minimized. This problem can be converted as follows: The minimum value in the maximum value must be greater than the maximum value inside the minimum value, as shown in
....... (1)
Depending on the conversion above, the problem is eventually translated into the following form:
The following is the derivation of the formula, the first derivative of B: ... (2)
Then the derivative of W: ... (3)
(2) (3) into the (1) type, you can get the results below:
Where the constraint is called the KKT condition, the following
In the above equation, when αn=0 1-yn (wtzn+b) can not be 0, when the αn≠0 1-yn (wtzn+b) = 0, this time corresponding (Xn,yn) for the support vector, according to the known α to find W, and then αn≠0 (1-yn) = 0, find out a B. Or take more than one B to average, this is, the final separation plane can be obtained, and thus see that the plane is generated only with the support vector-related.
By the above deduction, the duality problem can be obtained and transformed into the following form:
Using the two-time solver, write the form below to the right of the left.
According to the solution above, the optimal A is finally obtained, as shown in
In the linear variational problem, assuming that the data is linearly divided, there is a division of the super-plane can correctly classify the training samples, however, in reality, the original sample space may not exist a properly divided hyper-plane, so need to map to a higher dimensional space, as shown in:
The corresponding duality problem is transformed into
(4) Nuclear function
In the top of the push to the process of Q is very complex, need to map to high-dimensional space and then in the inner product, the bottom of the way is to find out the sample space of the product, and then map to the high-dimensional space, reduce complexity. The kernel function introduced at this time.
The corresponding kernel functions are:
The form of support vector machines transformed by kernel functions is:
With the above calculation, the kernel function support vector machine algorithm is as follows:
(5) Soft spacing and regularization
The problem mentioned in the preceding question has always been that the training data is linearly divided in the sample space, but in reality it is not so, even if a kernel function is found to make the training sample linear in the feature space, it may also lead to overfitting, so the proposed method to alleviate this error is "soft interval", This allows a partial error to exist, as shown in: On the diagram, the red circle shows a sample that does not meet the constraints,
This time the constraint becomes:
Through the above, you can do the following conversions, the objective functions and constraints become the following form
The above formula is solved by Lagrange method, which is transformed into the following form:
The parameters are then taken, respectively, into, as shown in (3), to get the following form:
The kernel's soft-pitch SVM algorithm is as follows:
α can be obtained by the above formula, but a little B has not been obtained, and the method of B is used in the same way as in (3).
When αs≠0, the sample does not have an effect on F (x),
When Αs>0, =1-ξi, the sample is a support vector
When ξi≤1, the vector is inside the maximum interval.
Support Vector Machine SVM