Support Vector Machine (SVM) is an original and not combinatorial classification algorithm with obvious visual geometric meaning, which has high accuracy.
The idea of using SVM algorithm: (1) Simple situation, the linear can divide the problem into a convex optimization problem, which can be simplified by Lagrange multiplier method, and then solved by the existing algorithm, (2) The complex condition, the linearity is not divided, the sample is projected into the high dimension space by the kernel function, so that it becomes linear and can be divided. The kernel function is used to reduce the high latitude computation.
first, the basic concepts related to SVM
Split Super Plane
If C and D are two disjoint convex sets, there are hyperplane p,p that can separate C and D.
The distance of two sets, defined as the shortest distance between the elements of two sets.
Do the Street of Set C and set D shortest segment.
(Image excerpt from July algorithm)
However, how to define the "optimal" split hyperplane of two sets. Finds several points on the collection "boundary" where the direction of the hyperplane is computed as "base", with the average of these points on the two set boundary as the "intercept" of the hyperplane. These points are called support vectors, and the dots are represented by the vector method available.
(Image taken from the July algorithm)
Enter Data
Suppose a training dataset on a given feature space
Where, for the first instance (if n>1, that is, X is multidimensional, has multiple attribute characteristics, at this time the vector);
The class tag for, when +1, is called the positive example, when 1 is called a negative example.
linear scalable Support vector machines
Given a linear separable training dataset, the discrete hyperplane obtained through the interval maximization is the corresponding decision function called linear separable support vector machine . Among them, is a certain characteristic space transformation function, its function is to map X to (higher) dimension, the simplest direct:. In fact, solving the discrete hyperplane problem can be equivalent to solving the convex two programming problem.
Sorting Symbols
Split plane:
Training set:
Target value:
Classification of new data:
Ii. The derivation process of SVM
deriving the objective function
According to the title set
Yes:
w,b Scaling, the value of the t*y is also scaled so that:
Maximum spacer separation hyperplane
Target function:, indicates the nearest point to the line distance as large as possible
(Image taken from the July algorithm)
function interval and geometry interval
Split plane: (function interval)
The function values of the two types of points can always be satisfied by scaling the W in equal proportions .
(Image taken from the July algorithm)
Establish the objective function
1. The method of equal proportional scaling w can always be used to make the function values of two kinds of points satisfy
2. Constraint conditions:
3. Original objective function:
4. New Objective function:
5. The objective function transforms:
6. Lagrange Multiplier method
7. The original problem is a very small problem
The dual problem of the original problem is the minimax problem.
8. The Lagrange function in 6 is biased against W and b respectively and makes it 0:
9. Calculation of Lagrange's even function
10. Continue to seek a great
11. Organize target function: Add minus sign
12. Linear Scalable support vector machine learning algorithm
The calculation results are as follows
13. Classification decision function
three, linear and can not be divided into SVM
1. If the data linearity is not divided, then increases the relaxation factor, causes the function interval plus the relaxation variable is greater than equals 1,
The constraint becomes
Objective function: (This is to ensure that the relaxation factor is not too large)
2. The convex optimization at this time is
3. Lagrange function
4. Convert the three-type into L and get
5. Finishing, get the optimization problem of dual problem
Obtain the optimal solution
6. Calculation
In practice, all the values of the support vector are usually taken as the average, as b*
7. Obtain the separation Super plane
8. The classification decision function is
kernel function : The kernel function can be used to map the original input space to the new feature space, so that the original linear irreducible samples can be divided in the nuclear space.
There are polynomial kernel functions
Gauss kernel function RBF
String kernel function
In practical applications, it is often relied on prior domain knowledge or cross-validation to select effective kernel function. Without more prior information, the Gaussian kernel function is used.
Kernel function Mapping:
(Image taken from the July algorithm)
(Image taken from the July algorithm)
Gauss Core
(Image taken from the July algorithm)
Thick lines are split over "plane", other lines are contours of y (x), and green punctuate are support vector points.
The Gauss kernel is infinitely dimensional, because
Note: The comparison between SVM and logistic regression: (1) classical SVM, direct output category, no posterior probability, (2) logistic regression, will give the category of the posterior probability, (3) The comparison is the similarities and differences between the objective functions.