The least structured risk, the understanding of VC dimension to SVM

Source: Internet
Author: User
Tags svm

The support vector machine method is based on the VC dimension Theory of statistical learning theory and the minimum structure risk principle.

Confidence risk: The classifier classifies the unknown sample and obtains the error. Experiential risk: A well-trained classifier that re-classifies the training samples. That is, sample error structure risk: Confidence risk + empirical risk structure risk minimization is a strategy proposed to prevent overfitting, and the maximum posteriori probability estimation in Bayesian estimation is an example of structural risk minimization. When the model's conditional probability distribution and loss function are logarithmic loss function and the model complexity is represented by the model priori probability, the structural risk minimization is equivalent to the maximum posteriori probability estimation. The supervised learning problem becomes the optimization problem of empirical risk or structural risk function, at which time the empirical risk or structural risk function is the optimal objective function.

SVM can get much better results than other algorithms in the small sample training set. Support Vector Machine is one of the most commonly used and best-performing classifiers in the world, which is due to its excellent generalization ability, because its optimization goal is to minimize the risk of structural risks, and not to minimize the risk of experience, so through the concept of margin, we get a structured description of the data distribution. This reduces the need for data size and data distribution.SVM is not better than other algorithms in any scenario, and it is best to try out multiple algorithms for each application and then evaluate the results. such as SVM in the message classification, but not as logical regression, KNN, Bayes effect is good.
VC Dimension: The N points are classified, such as divided into two categories, then there can be 2^n species, that is, can be understood as a 2^n learning problem. If there is a hypothetical H, the problem of 2^n can be correctly categorized. Then the number of these points N, is the VC dimension of H. This definition is so stiff that it can only be remembered first. One instance of the VC dimension for the linear division of 3 points on a plane is 3. On the plane, the VC dimension is not 4, because there is no 4 sample points, can be divided into 2^4 = 16 partitioning method, because the diagonal two pairs of points can not be divided into two categories of linear. More generally, in R-dimensional space, the VC dimension of the linear decision plane is r+1.the factors influencing the confidence risk are: the number of training samples and the VC dimension of the classification function. The number of training samples, that is, the more samples, the confidence risk can be smaller; the larger the VC dimension, the more kinds of solution, the more the promotion ability is worse, the greater the confidence risk. Therefore, increase the number of samples, reduce the VC dimension, can reduce the confidence risk. but the general classification function needs to improve the VC dimension, that is, the characteristic data of the sample, to reduce the empirical risk, such as the polynomial classification function. As a result, the confidence risk is higher, and the structural risk becomes higher correspondingly. Excessive learning is overfit, which is the reason for the higher confidence risk.
Structural risk minimization SRM (structured risk minimize) is the consideration of both empirical and structural risks. In the case of small sample, get better classification effect. while guaranteeing classification accuracy (empirical risk), Reduce the VC dimension of the Learning machine
When the training sample is given, the larger the classification interval, the smaller the VC dimension of the corresponding categorical hyper-plane set. (requirement of classification interval, effect on VC dimension)
According to the structure risk minimization principle, the former is to ensure the minimum of empirical risk (empirical risk and expected risk depends on the choice of learning machine function family), while the latter makes the maximum classification interval, which leads to the smallest VC, in fact, it is to make the scope of the promotion of the minimum confidence range, so as to achieve the minimum real risk.
Training samples in the case of linear can be divided, all the samples can be correctly classified (this is not the legendary yi* (w*xi+b)) >=1 conditions, that is, the experience of risk remp 0, by maximizing the classification interval (eh, This is φ (w) = (*w*w), so that the classifier to achieve the best promotion performance.

For a linear non-divided condition, you can allow the wrong score. That is, the classification interval is reduced for outliers. The farther away from the original classification surface, the more serious the outliers, the distance can be expressed by a value-relaxation variable, and only the outliers have relaxation variables. Of course, to limit this value, that is, in the minimization of the function, add a penalty, there is a person can be set to punish the C. When C is infinitely large, then it is degraded to the hard interval problem, the outliers are not allowed, the problem may be no solution. If c=0, ignore outliers. Sometimes the C value needs to be tried several times to get a better value.




The least structured risk, the understanding of VC dimension to SVM

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.