1. Brief Introduction
SVM involves many things. If we need to fully understand it, we need to understand empirical risks and confidence risks, VC Dimension theory, and derive the optimal formula, the Laplace solution for optimal solution, and the kernel function, currently, the SVM tool kit is usually used. Here, we record the part of the SVM optimization formula. The main reference is Wikipedia, it seems that Wikipedia's explanation in this section is clear and simple.
2. Derivation
· Known information
Sample Data: XI indicates the feature vector, Yi indicates the annotation, P indicates the dimension of the feature vector, and N indicates the number of samples.
Objective: To find the maximum interval hyperplane. On the one hand, ensure that all samples are separated, and on the other hand, there is no maximum interval between samples on both sides of the hyperplane.
· Derivation
Assume that the hyperplane is W · x + B = 0 (W transpose is generally used, which is not convenient to use W here)
In general, the hyperplane family is used to represent W · x + B = 1 and W · x + B =-1.
If the data line of the sample data can be divided, we can find such two superplanes so that there is no sample point between the two planes, and the distance between the two superplanes is the largest. There is no sample point between two planes, which is equivalent to Yi (W · Xi + B)> 1, I = 1, 2 ,..., n. the distance between two superplanes is 2/| w |. The maximum interval is equivalent to minimizing | w |.
Therefore, the two superplanes with the largest interval to separate all samples can be described as follows:
Min | w |,
S. t. yi (w · xi + B)> 1, where I = 1, 2,..., n
· For ease of solution, it can be further converted
Min 1/2 | w * w |
S. t. yi (w · xi + B)> 1, where I = 1, 2,..., n
3. Reference
Wikipedia _ Support Vector Machine
Http://en.wikipedia.org/wiki/Support_vector_machine
Http://zh.wikipedia.org/wiki/%E6%94%AF%E6%8C%81%E5%90%91%E9%87%8F%E6%9C%BA