Classification algorithm SVM (support vector machine)

Classification algorithm SVM (support vector machine) _SVM

Last Update:2018-08-22 Source: Internet

Author: User

Tags svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The main idea of support vector machine (Support vector Machine, SVM) is to establish an optimal decision Hyperplane, which maximizes the distance between two kinds of samples closest to the plane, thus providing a good generalization ability for the classification problem. For a multidimensional sample set, the system randomly generates a hyperplane and moves continuously, classification of samples, until the training samples belong to different categories of sample points just on the two sides of the hyperplane, satisfies the condition of the hyperplane may have many, SVM formally ensure the classification accuracy, while looking for such a hyperplane, The maximum of the blank areas on both sides of the hyperplane is maximized, thus the optimal classification of the linear and sub samples is realized.

Support Vector (Support vector) in support vector machines (SVM) refers to some training points in the training sample set, which is the most difficult data point to classify, which is closest to the classification decision surface. The optimal classification criterion of SVM is that the distance of these points from the classification hyperplane reaches the maximum value. "Machine" (Machine) is a general term for some algorithms in machine learning field, often regarded as a machine, or learning function. SVM is a supervised learning method, which mainly aims at learning, classifying and predicting small sample data, similar learning methods based on samples and decision tree induction algorithm.

The advantages of SVM:
1, do not need a lot of samples, do not need to have a lot of samples does not mean that the absolute amount of training samples is very small, but compared to other training classification algorithm compared to the same problem complexity, the SVM requirements of the sample is relatively small. And because SVM introduces kernel function, SVM can deal with the high-dimensional samples easily.
2, the structure of the minimum risk. This risk refers to the cumulative error between the approximation of the real model of the problem and the real solution of the problem.
3, Non-linear, refers to the SVM is good at dealing with the sample data linearly, mainly through the relaxation variables (also called penalty variables) and kernel function technology to achieve, this part is also the essence of SVM.

First, linear classification

For the simplest case, in a two-dimensional space, it is necessary to classify the white dots and black dots shown in the following illustration, obviously, this line in the following diagram satisfies our requirements, and such a line is not unique.

The role of SVM is to find the most suitable location for the decision line. Other workable lines can be as follows:

So which line is the best one? is the classification of the distance between the two sides of the decision line distance from the nearest point to the line synthesis furthest from the line, that is, the greater the gap between the better, so that the characteristics of higher accuracy, fault-tolerant space is larger. This process is called the maximum interval (Maximum marginal) in SVM. The gap between the red and blue lines below is the interval to maximize, and obviously in this case, the category line in the middle position can make the maximum interval reach the maximum.

Second, linear is not divided

In reality, based on the linear classification of the above situation is not representative, more cases of distributed random sample data, in this case, based on linear classification of line segmentation can not accurately complete segmentation. In the following figure, the black dots are doped with white dots, and the white dots are doped with black dots:

For this non-linear situation, one method is to use a curve to perfectly divide the sample set, as shown in the following diagram:

Extending from two-dimensional space to multidimensional, some non-linear method can be used, so that the space is converted from the original linear space to the higher space of another dimension, in this high dimensional linear space, a hyperplane is used to divide the sample, which is equivalent to increasing the distinction between different samples and distinguishing conditions. In this process, the kernel function plays a vital role, and the function of the kernel function is to convert the completely irreducible problem into a state that can be divided or reached an approximate point without increasing the complexity of the algorithm.

The red and green dots on the left of the above figure are in the two-dimensional space, the green dots are surrounded by red dots, linear is not divided, but extended to three-dimensional (multidimensional) space, you can see that the red-Green point in the Z-direction of the distance there is a significant difference between the same category of the point set has a common feature is that they are basically in a plane, so the use of this distinction, You can use a super flat face to classify these two types of samples, such as the yellow plane in the above image.

Linear irreducible mapping to high dimensional space can lead to very high dimensions, which can be infinite multidimensional in exceptional cases, which can result in computational complexity and a surprising amount of computation. But in SVM, the existence of kernel function makes the operation still in the low dimensional space, and avoids the time consuming of complex operation in high dimension space.

Another tricky part of SVM is the addition of a relaxation variable to handle the possible noise problem with the sample data, as shown in the following illustration:

SVM allows the data points to some extent to deviate from the hyperplane, this offset is the SVM algorithm can be set outlier value, corresponding to the image of the black implementation of the length. The addition of the relaxation variable makes the SVM not only pursue the optimal local effect, but start from the overall situation of the sample data distribution, which is the so-called big event.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More