The detailed derivation process and annotations of support vector machine (SVM)

Last Update:2016-03-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I am the porter: http://my.oschina.net/wangguolongnk/blog/111353

The principle of support vector machine is very simple, which is the VC dimension theory and the minimization of structural risk. In reading the relevant papers, found a lot of articles are vague, even the "A Tutorial on support Vector machines for Pattern recognition" This article on the Lagrange condition extremum problem of the dual transformation is just a stroke, Make a lot of people feel very confused. Below I will give a detailed derivation of the linear sub-conditions of SVM.

As shown, there is a bunch of positive and negative samples of the training data, labeled:, suppose there is a super-planar h:, you can divide these samples correctly, and there are two super-planar H1 and H2 parallel to H:

The positive and negative samples closest to H just fall on H1 and H2 respectively, so the sample is the support vector. All other training samples will be located outside of H1 and H2, i.e. the following constraints are met:

A unified formula is:

(1)

The distance between the H1 and H2 is as follows:

The task of SVM is to find such a super plane h to divide the sample into two parts without error, and make the distance between H1 and H2 the largest. To find such a hyper-plane, you simply maximize the interval margin, which is minimized. The following conditional extremum problems can be constructed:

(2)

For the conditional extremum problem constrained by inequality, it can be solved by Lagrange method. The Lagrange equation is constructed by multiplying the non-negative Lagrangian coefficients with the constrained equations and subtracting them from the target function. The Lagrange equation is then obtained as follows:

(3)

which

(4)

Then the planning problem that we are dealing with becomes:

(5)

The above formula is the expression of the Lagrange conditional extremum with strict inequality constraints. For this step of the transformation, many articles do not make more statements, or understand the deviation, thus affecting the reader's subsequent deduction. Here I will take a step-by-step derivation to solve the puzzle.

(5) is a convex programming problem, its significance is to the α-biased derivative, so that it equals 0 to eliminate α, and then the W and b to find the minimum value of L. It is difficult to solve the equation directly (5), by eliminating the Lagrange coefficients and simplifying the equations, which is useless to our problems. Fortunately, this problem can be solved by Lagrange duality, for which we make an equivalent transformation of (5):

The above is a dual transformation, which transforms the convex programming problem into a duality problem:

(6)

The significance is: the original convex programming problem can be converted to W and b to be biased, so that it equals 0 to eliminate W and B, and then the alpha to the maximum value of L. Here we will solve the (6) formula, for which we first calculate the partial derivative of W and B. The (3) formula has:

(7)

In order for L to get the minimum value on W and B, the two partial derivative of (7) is 0, so it gets:

(8)

Return (8) to the (3) type, you can get:

(9)

Then the (9) substituting (6) is:

(10)

Considering the (8) formula, our duality problem becomes:

(11)

This programming problem can be solved directly from the numerical method.

One point to note is that (2) The conditional extremum problem can be transformed into the (5) Type convex programming problem, which implies a constraint, namely:

(12)

This constraint is so derived, if (2) and (5) are equivalent, there must be:

Put (3) into the above formula, get:

Simplification gets:

(13)

Also because of the constraints (1) and (4), there are:

Therefore, to make (13) Form, only make:, thus get (12) type of constraint. The implication of this constraint is that if a sample is a support vector, its corresponding Lagrange coefficients are not 0; If a sample is not a support vector, its corresponding Lagrange coefficient must be 0. The majority of Lagrange coefficients are 0.

Once we have solved all the Lagrange coefficients from the (11) formula, we can pass (8)

The normal vector w of the optimal segmented surface H is computed. The partition threshold B can also be computed with support vectors using the (12) constraint. So we found the best H1 and H2, and this is the SVM we trained.

The detailed derivation process and annotations of support vector machine (SVM)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The detailed derivation process and annotations of support vector machine (SVM)

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support