Machine Learning Theory and Practice (5) Support Vector Machine

Last Update:2018-12-05 Source: Internet

Author: User

Tags svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Support vector machine-SVM must be familiar with machine learning, Because SVM has always occupied the role of machine learning before deep learning emerged. His theory is very elegant, and there are also many variant Release versions, such as latent-SVM and structural-SVM. In this section, let's take a look at the SVM theory. In (Figure 1), a diagram shows two types of datasets. In Figure B, C, and D, a linear classifier is provided to classify data? But which one is better?

(Figure 1)

For this dataset, the three classifiers are good enough, but it is not. This is only a training set, and the distribution of samples in actual tests may be scattered, there are various possibilities. To cope with this situation, we need to try to make the linear classifier as far as possible from the two datasets, because this will reduce the risk of the actual test samples crossing the classifier, improve detection accuracy. The idea of maximizing the distance from a dataset to a classifier (margin) is to support the core idea of vector machines, and the samples closest to the classifier become support vectors. Now that we know that our goal is to find the maximum margin, how can we find the support vector? How to implement it? The following figure shows how to complete these tasks.

(Figure 2)

Assume that the line in (Figure 2) represents a superplane. In order to view the line in one dimension, the features are from the superplane dimension plus one dimension. As shown in the figure, the features are in two dimensions, the classifier is one-dimensional. If the feature is three-dimensional, the classifier is a plane. Assume that the superplane's analytical formula is, and the distance from point A to the superplane is, the following distance proof is given:

(Figure 3)

In (figure 3), the blue-colored diamond represents the superplane, xn indicates a point in the data set, W indicates the superplane weight, and W indicates that the superplane is perpendicular to the superplane. It is very simple to prove the vertical. Assume that X' and X' are a point above the surface,

Therefore, W is perpendicular to the superplane. Knowing that W is perpendicular to the superplane, the distance from xn to the superplane is actually the projection of the line of X at any point on the XN and the superplane on W, as shown in Figure 4:

(Figure 4)

The projection of (Xn-x) on W can be calculated by (Formula 1), and the distance calculation is also completed by Formula 1:

(Formula 1)

Note that the distance calculation is obtained only when the configuration item method is used and the supersurface analytical formula is used. With the distance, we can deduce our idea at the beginning: to make the classifier The farthest distance from all samples, that is, to maximize the margin, but the premise of maximizing the margin is that we need to find the support vector, that is, the sample point closest to the classifier. At this time, we need to complete two optimization tasks, locate the point closest to the classifier (Support Vector), and maximize the margin. As shown in (formula 2:

(Formula 2)

The braces indicate that the support vector closest to the super surface of the classification is found, and the distance from the Super surface to the support vector is the farthest out of the braces. It is quite difficult to optimize this function, there are currently no efficient optimization methods. However, we can change the problem to another one. If we fix the optimization problem in braces, it is easy to optimize the outside. We can use the current optimization method to solve the problem, therefore, let's make a hypothesis that if the numerator in the braces is equal to 1, we only have to optimize W. The entire optimization formula can be written in the form of (Formula 3:

(Formula 3)

This is simple. There is an optimization with equality constraints. The constraint formula is sub-. There is also a tip behind this constrained equation. Suppose we set the label of the sample XN to 1 or-1, when XN is above the superplane (or on the right side), the analytical expression of bringing the superplane to get a value greater than 0. The multiplication label 1 is still itself, which can represent the distance from the superplane; when XN is under the superplane (or left), the analytical formula of the superplane is used to obtain a value smaller than 0. The value of-1 is positive, and the distance can still be indicated, therefore, if we convert two types of labels 0 and 1 to-1 and 1, we can perfectly combine the tag information into the equality constraint. (Formula 3) the last line is also displayed. Next we will continue to talk about Optimization
In optimization, the optimization problems we usually need to solve are as follows:

(I) no constraints on optimization issues, which can be written:

Min f (x );

(Ii) an optimization problem with equality constraints can be written as follows:

Min f (x ),

S. T. h_ I (x) = 0; I = 1,..., n

(Iii) An Optimization Problem with inequality constraints can be written as follows:

Min f (x ),

S. T. g_ I (x) <= 0; I = 1,..., n

H_j (x) = 0; j = 1,..., m

For the optimization problem of the (I) class, the commonly used method is the Fermat theorem, that is, to obtain the derivative of f (x), and then let it be zero, you can obtain the optimal candidate value, then, it is verified in these candidate values. If it is a convex function, it can ensure that it is the optimal solution.

For the optimization of the (ii) Class, the usual method is the lagrangemultiplier, that is, to constrain the equation h_ I (x) with a coefficient and f (x) it is written as a sub-statement, called the Laplace function, and the coefficient is called the Laplace multiplier. Using the Laplace function to evaluate the derivation of each variable and make it zero, you can obtain a set of candidate values, and then verify and obtain the optimal value.

The kkt condition is often used for optimization issues of category (III. Similarly, we write all equations, inequality constraints, and f (x) into a formula, also known as the Laplace function. The coefficients are also called the Laplace multiplier. Through some conditions, it is a necessary condition for finding the optimal value. This condition is called the kkt condition.

(Formula 3) is obviously in line with the second type of optimization method. Therefore, we can use the Laplace multiplier method to solve the problem. Before solving the problem, we should first make a simple transformation to (formula 4. Maximization | w | the derivative can be minimized | w | or W' W, as shown in formula 4:

(Formula 4)

Obtain the formula shown in (formula 5) by using the formula of the Laplace multiplier:

(Formula 5)

In formula 5, we use the Laplace multiplier function to evaluate the derivation of W and B, respectively. To obtain the extreme point, let the derivative be 0 and get

And then place them in the formula (formula 6) of the Laplace multiplier:

(Formula 6)

(Formula 6) the last two rows are the optimization functions to be solved. Now we only need to make a secondary plan to obtain Alpha. The Quadratic plan optimization solution is shown in (Formula 7:

(Formula 7)

After alpha is obtained through Formula 7, W can be obtained using the first row in formula 6. So far, the formula derivation of SVM has been basically completed. It can be seen that the mathematical theory is very rigorous and elegant. Although some colleagues think it is boring, it is better to look at it from the beginning, the difficulty is optimization. The second-level planning solution requires a large amount of computing. in practical applications, the SMO (sequential minimal optimization) algorithm is commonly used. The SMO algorithm is intended to be placed in the next section in combination with the code.

References:

[1] machine learning in action. Peter Harrington

[2] Learning from data. Yaser S. Abu-Mostafa

Reprinted please indicate Source: http://blog.csdn.net/cuoqu/article/details/9286099

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More