SVM Support Vector Machine

Last Update:2018-07-25 Source: Internet

Author: User

Tags svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

SVM support Vector Machine

support vectors: refers to the most difficult data points in the training set that are closest to the classification decision surface.
"Machine": that is, machine machines, is actually an algorithm. In the field of machine learning, some algorithms are often regarded as a machine (or learning machine, predictive function, learning function, etc.).

SVM is a supervised learning method, that is, the category of the known training point, and the corresponding relationship between the training point and the category.
SVM mainly for small sample data for learning, classification and prediction (regression), similar to the sample based on the method of learning and case-based reasoning (case-based reasoning), decision tree induction algorithm. The theoretical basis of 1.1 support vector machine
– 1.1.1 Experience risk optimal ERM
–1.1.2 key theorem and VC dimension
– Mathematical derivation of 1.1.3 Structural risk-optimal SRM 1.2 SVM
– 1.2.1 maximum interval over plane
– 1.2.2 Lagrange Multiplier method
– 1.2.3 Kkt conditions and dual transformations
– 1.2.4 Classifier function
– 1.2.5 Mapped to high-dimensional space
– 1.2.6 Kernel function method
– Relaxation variables for 1.2.7 outliers 1.3 SMO algorithm 1.1 theoretical basis of support vector machines

Support Vector machine based on statistical learning, the structure selection and local minimum (over fitting and less fitting) of the second generation neural network are solved.

Statistical learning theory proves the difference between empirical risk and real risk of finite sample from the angle of statistics, and introduces the concept of confidence interval for the first time, creates the structure risk optimal theory, and provides a unified evaluation framework for all kinds of machine learning algorithms. 1.1.1 Empirical risk optimal ERM

First we can start with the concept of the General and the sample , we understand the overall extension as objective things (System) itself , in statistics we can understand the general as a probability distribution , We need to understand that this real distribution is difficult (and practically impossible), so we extract representative objects from the population and become the overall sample .

We use the distribution of samples as an approximate model or an empirical model of things as a whole. The risk represents the error between the empirical model and the real model.
If a classifier distributes the distribution of empirical models as a whole, we call the error of this classifier an empirical risk .
For convenience, we use the loss function (Loss functions) to evaluate.
Suppose the sample: (x1,y1),..., (Xn,yn) ∈RNXR (x_1,y_1),..., (x_n,y_n) \in r^n \times R, the empirical function for discrete samples can be written:
Remp (α) =1n∑i=1nl (Yi,f (xi,α)) R_{emp} (\alpha) =\frac{1}{n}\sum_{i=1}^{n}l (Y_i,f (X_i,\alpha))
where Remp (α) R_{emp} (\alpha) is the so-called empirical risk. F (xi,α) F (x_i,\alpha) is the objective function used to minimize the empirical risk, L (yi,f (Xi,α)) L (Y_I,F) is the loss function, which represents the deviation between the label Yi X_i,\alpha and the target function for each XI x_i.
For example, the global error function in BP Neural network, the change of slope and intercept in gradient descent method, etc.

In the second generation neural network, we use the minimum empirical risk to represent the real risk, which is the least empirical risk principle erm, the neural network to minimize the empirical risk as the main basis for measuring the accuracy of the algorithm. We find that this does not achieve the global optimal classification effect, easy to cause the phenomenon of overfitting (two reasons: A sample is not strong, the second learning algorithm theory is incomplete, that is, the method of evaluating the real risk is incomplete).

– Over-fitting/over-learning problems: The training error is too small to reduce the ability to promote, that is, the increase in real risk.
– Generalization capability: the accuracy of predicting unknown samples.

To solve these problems--statistical methods--learning consistency issues. 1.1.2 key theorem and VC dimension

Learning Consistency: Solving the problem of measuring the risk of experience and the real risk.
first definition of statistical learning/definition of learning consistency
If an ERM algorithm can provide a function set Q (x,α) Q (x,\alpha) to be able to be true risk and the empirical risk converges to the lower bound of R (α) R (\alpha), then the ERM algorithm conforms to the learning consistency.

PS: Similar to the above

The key theorem of learning theory

The key theorem transforms the learning coherence problem into a convergent problem. Minimizing functions that approximate real (desired) risks through experiential risk minimization functions. That is, the empirical risk minimization principle conforms to the learning process consistency condition that the q (x,α) q (x,\alpha) function sets the worst function in the set.
The method used to measure is VC dimension .

definition of VC dimension : Suppose that there is a sample set with H samples that can be taken by a function set in a function in accordance with all possible 2h

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More