Support Vector Machine

Last Update:2015-11-16 Source: Internet

Author: User

Tags dashed line svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Support Vector machines is an optimization problem. They is attempting to find a hyperplane that divides the both classes with the largest margin. The support vectors is the points which fall within this margin. It's easiest to understand if your build it up from simple to more complex.

Support vector machine is an optimization problem. They (these points) attempt to find a hyper plane at a maximum distance (2/| | w| | ) to divide two classes. Support vectors are those points that fall on the edge (dashed line).

Hard Margin Linear SVM

Hard interval linear support vector machine (linear separable training sample)

In an a training set where the data is linearly separable, and you were using a hard margin (no slack allowed), the Vectors is the points which lie along the supporting hyperplanes (the hyperplanes parallel to the dividing hyperplane at The edges of the margin)

In a training set, the data here is linearly divisible, and you can use a strict (hard) interval to divide them into two categories (not allowing those points to escape to the middle of the two dashes), separating the hyper-plane (solid line) parallel to the two edges to split the super-screen (dashed). The edge-Bold point is the support vector.

All of the vectors lie exactly on the margin. Regardless of the number of dimensions or size of data set, the number of the support vectors could is as little as 2.

All support vectors fall precisely on the edge. No matter how large the dimension of the space is, or how large the data collection is, the number of support vectors is generally few and can be as small as 2. When deciding to detach a hyper-plane, only the support vectors play a critical role, and the other points do not work. So support vector machines are determined by these very few, very "important" training data.

Soft-margin Linear SVM

Soft interval linear support vector machine (linear non-tick training sample)

But what if our dataset isn ' t linearly separable? We introduce soft margin SVM. We no longer require that we datapoints lie outside the margin, we allow some amount of them to stray over the line into The margin. We Use the slack parameter C to control this. (Nu in NU-SVM) This gives us a wider margin and greater error in the training dataset, but improves generalization and/or allows us to fi nd a linear separation of data is not linearly separable.

But what if our data sets are not linearly divided? We introduce a soft interval support vector machine. We no longer require data points to fall on either side of the interval, but rather allow some points to escape and fall into the interval. We use the relaxation parameters (penalty parameters) C to control. (Nu in NU-SVM) This gives me a wider distance and a bigger error. On the training data set, the generalization is improved and a linearly separable super-plane is found to separate these linearly irreducible data.

Now, the number of the support vectors depends in how much slack we allow and the distribution of the data. If we allow a large amount of slack, we'll have a large number of support vectors. If we allow very little slack, we'll have the very few support vectors. The accuracy depends on finding the right level of slack for the data being analyzed. Some data It is possible to get a high level of accuracy, we must simply find the best fit we can.

Now, these support vectors depend on the degree of relaxation we allow and the distribution of the data. If we allow a large number of slack, we will get a lot of support vectors, and if we allow very little slack, we will get very little support vectors. The degree of accuracy of the classification depends on the correctness of the degree of relaxation of the data being analyzed, so we must find the most appropriate degree of relaxation.

Non-linear SVM Nonlinear Support vector

This brings us to Non-linear SVM. We is still trying to linearly divide the data, but we is now trying to does it in a higher dimensional space. This is the done via a kernel function, which of course have its own set of parameters. When we translate the original feature space, the result is non-linear:

We try to linearly divide these non-linear data and put them in a higher dimensional space. To do this you need to pass the kernel function, the kernel function will have its own set of training parameters. When we restore this data to the original feature space, the result is non-linear.

Now, the number of the support vectors still depends on what much slack we allow, but it also depends on the complexity of our Model. Each twist and turn in the final model in we input space requires one or more support vectors to define. Ultimately, the output of an SVM are the support vectors and a alpha, which in essence are defining how much influence that Specific support vectors have on the final decision.

The support vectors here still depend on the degree of relaxation we allow, and also on the complexity of our model. Each tortuous input space requires one or more support vectors. Ultimately, the support vector machine output is these support vectors and the parameter alpha, which essentially determines the extent to which the specific support vectors affect the final result.

Here, accuracy depends in the trade-off between a high-complexity model which may over-fit the data and a large-margin WHI CH would incorrectly classify some of the training data in the interest of better. The number of support vectors can range from very few to every single data point if you completely over-fit your data. This tradeoff is controlled via C and through the choice of kernel and kernel parameters.

The degree of precision here depends on the tradeoff between a high-complexity model (which produces overfitting) and a large distance. The number of support vectors can range from very little to every data point if you completely fit the data. This tradeoff is controlled by the choice of parameter C and kernel parameters.

I assume when you said performance were referring to accuracy, but I thought I would also speak to performance in term S of computational complexity. In order to test a data point using an SVM model, you need to compute the dot product of each support vector with the test Point. Therefore The computational complexity of the model is linear in the number of the support vectors. Fewer support vectors means faster classification of test points.

Not only should the classification accuracy be considered, but also the computational complexity should be considered. in order to use the SVM model to test a data point, you need to calculate the dot product of each support vector with the test point. Therefore, the computational complexity of the model is the number of support vectors. Fewer support vectors allow faster classification of test data.

Support Vector Machine

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More