Support Vector Machine SVM derivation and solution process __ machine Learning

Source: Internet
Author: User
Tags svm

Support Vector Machine (SVM) is an original and not combinatorial classification algorithm with obvious visual geometric meaning, which has high accuracy.

The idea of using SVM algorithm: (1) Simple situation, the linear can divide the problem into a convex optimization problem, which can be simplified by Lagrange multiplier method, and then solved by the existing algorithm, (2) The complex condition, the linearity is not divided, the sample is projected into the high dimension space by the kernel function, so that it becomes linear and can be divided. The kernel function is used to reduce the high latitude computation.

first, the basic concepts related to SVM

Split Super Plane

If C and D are two disjoint convex sets, there are hyperplane p,p that can separate C and D.

The distance of two sets, defined as the shortest distance between the elements of two sets.

Do the Street of Set C and set D shortest segment.

(Image excerpt from July algorithm)


However, how to define the "optimal" split hyperplane of two sets. Finds several points on the collection "boundary" where the direction of the hyperplane is computed as "base", with the average of these points on the two set boundary as the "intercept" of the hyperplane. These points are called support vectors, and the dots are represented by the vector method available.

(Image taken from the July algorithm)

   

Enter Data

Suppose a training dataset on a given feature space

Where, for the first instance (if n>1, that is, X is multidimensional, has multiple attribute characteristics, at this time the vector);

The class tag for, when +1, is called the positive example, when 1 is called a negative example.

  

linear scalable Support vector machines

Given a linear separable training dataset, the discrete hyperplane obtained through the interval maximization is the corresponding decision function called linear separable support vector machine . Among them, is a certain characteristic space transformation function, its function is to map X to (higher) dimension, the simplest direct:. In fact, solving the discrete hyperplane problem can be equivalent to solving the convex two programming problem.


Sorting Symbols

Split plane:

Training set:

Target value:

Classification of new data:


Ii. The derivation process of SVM

deriving the objective function

According to the title set

Yes:

w,b Scaling, the value of the t*y is also scaled so that:


Maximum spacer separation hyperplane

Target function:, indicates the nearest point to the line distance as large as possible


(Image taken from the July algorithm)

function interval and geometry interval

Split plane: (function interval)

The function values of the two types of points can always be satisfied by scaling the W in equal proportions .

(Image taken from the July algorithm)


Establish the objective function

1. The method of equal proportional scaling w can always be used to make the function values of two kinds of points satisfy

2. Constraint conditions:

3. Original objective function:

4. New Objective function:


5. The objective function transforms:


6. Lagrange Multiplier method

7. The original problem is a very small problem

The dual problem of the original problem is the minimax problem.

8. The Lagrange function in 6 is biased against W and b respectively and makes it 0:


9. Calculation of Lagrange's even function


10. Continue to seek a great


11. Organize target function: Add minus sign


12. Linear Scalable support vector machine learning algorithm

The calculation results are as follows


13. Classification decision function


three, linear and can not be divided into SVM

1. If the data linearity is not divided, then increases the relaxation factor, causes the function interval plus the relaxation variable is greater than equals 1,

The constraint becomes


Objective function: (This is to ensure that the relaxation factor is not too large)

2. The convex optimization at this time is


3. Lagrange function


4. Convert the three-type into L and get


5. Finishing, get the optimization problem of dual problem


Obtain the optimal solution

6. Calculation


In practice, all the values of the support vector are usually taken as the average, as b*

7. Obtain the separation Super plane

8. The classification decision function is



kernel function : The kernel function can be used to map the original input space to the new feature space, so that the original linear irreducible samples can be divided in the nuclear space.

There are polynomial kernel functions

Gauss kernel function RBF

String kernel function

In practical applications, it is often relied on prior domain knowledge or cross-validation to select effective kernel function. Without more prior information, the Gaussian kernel function is used.

Kernel function Mapping:

(Image taken from the July algorithm)

(Image taken from the July algorithm)


Gauss Core

(Image taken from the July algorithm)

Thick lines are split over "plane", other lines are contours of y (x), and green punctuate are support vector points.

The Gauss kernel is infinitely dimensional, because


Note: The comparison between SVM and logistic regression: (1) classical SVM, direct output category, no posterior probability, (2) logistic regression, will give the category of the posterior probability, (3) The comparison is the similarities and differences between the objective functions.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.