July Algorithm-December machine Learning online Class -12th lesson note-Support vector machine (SVM)
July algorithm (julyedu.com) December machine Learning Online class study note http://www.julyedu.com
?
What to review:
- Duality problem
- KKT conditions
?
SVM1.1 three types of data
- Linear Scalable Support Vector machine
- Linear Support Vector Machine
- Nonlinear support vector machines
?
1.2 The significance of the distance of the 1.2.1 sample to the classification surface by linear classification
The distance from point to line, ABC is normalized. "+" positive category, "-" negative category
The distance can therefore be represented directly by the function value of ax+by+c=f (x, y)
Can be written as the following vector form:
Calculates the minimum value of n sample points to a straight line distance, after which the maximum value is known in these lines
The next step is to calculate
?
1.2.2 Input data
1.2.3 Linear Scalable Support vector machine
, Medium 3 support vectors
1, function interval and geometric interval:
, 2, maximum interval separation over plane
Target function:
And because the number of parentheses in the target function is all greater than or equal to 1, that is:
So you can get the new target function and its constraints as follows:
Then, the new objective function is as follows:
Convex optimization problem with inequality constraints, using Lagrange multiplier method
- The original problem is a very small problem
To: The duality of the original problem is a minimax problem:
Lagrange Multiplier method
Next go to the maximum value of L (w,b,a), the algorithm is the maximum value of SMO
A=0, not support vectors, A is sparse, most samples are 0
1.3 Linear support vector machine (data linear non-divided)
1, not necessarily classified exactly the right super plane is the best
2 If the sample data itself is linearly non-divided
Relaxation factors need to be added ():
The target function becomes:
The SVM Lagrangian function with relaxation factor
The same can be obtained for the even function, converted to the extremum of the request
The relaxation factor has been eliminated, and finally the optimal solution is obtained, and the inverse takes it back and obtains w,b.
?
loss Function Analysis: just no penalty
Kernel functions:
The kernel function can be used to map the original input space to the new feature space, so that the originally linearly non-divided sample may be divided in the nuclear space.
The Gaussian kernel function, RBF, is mapped to an infinite dimension (using the Taylor expansion), but is easily overfitting.
?
Solution of coefficients in 1.3 SVM: SMO (select only two times per time)
1, there are multiple Lagrangian multipliers
2, select only two of them at a time to do optimization, other factors are considered constant.
?
Two-Variable optimization problem
From the above, if you can get the following figure:
The corresponding relationship of LH,
The condition of the program exit is:, that is, the right value is not updated at this time.
July algorithm--December machine Learning online Class-12th lesson note-Support vector machine (SVM)