Original: http://blog.csdn.net/keith0812/article/details/8901113
The support vector machine method is based on the VC dimension Theory of statistical learning theory and the minimum principle of structural risk.
Structured risk
Structured risk = empirical risk + confidence risk
Empirical risk = error of the classifier on a given sample
Confidence risk = Error of the result that the classifier classifies on unknown text
Confidence Risk Factors:
The number of samples, the larger the number of samples, the more likely the learning result is correct, at this time the confidence risk is smaller;
The VC dimension of the classification function, obviously the VC dimension is bigger, the promotion ability is worse, the confidence risk will become bigger.
Increase the number of samples, reduce the VC dimension, reduce confidence risk.
Before the goal of machine learning is to reduce the experience risk, to reduce the risk of experience, it is necessary to improve the complexity of the classification function, resulting in a high VC, VC Vego, confidence risk is high, so, the structural risk is also high. ----This is where SVM has an advantage over other machine learning.
SVM can reduce the VC dimension, the main one is the introduction of the kernel function.
In front of this part of the knowledge is in the learning of SVM when the other People's blog, at that time on the VC dimension is not very understanding, see many times are foggy. But in later study found this probability often appear, to a lot of algorithms can not have a part of the correct understanding, today summon up the courage to learn again the concept of VC, collation as follows:
Example: a linear two classification function can break a set that contains only three elements so the VC dimension of the linear two classification function is 3
Abstract: A set of functions that can be sprinkled with a set of H elements called the VC dimension of the function set is H
Speaking of which, we may not understand the principle of breaking up the theorem, that is, with the two classification function as an example
Suppose there is a collection of three elements, and these three elements should exist 2^3 that are 8 forms apart, as follows:
The linear Two classification function can realize this requirement, so the VC dimension of the linear two classification function is 3.
Also for sets with h elements, if a function set can be separated by 2^h, we call the VC dimension of this function set h
If there are functions for any number of samples, they can be scattered. The VC dimension of the function set is infinite. That is, the set of functions can break apart a collection that contains any element.
VC Dimension Definition Application
The researchers concluded from the analysis that the requirement that the empirical risk minimization learning process is consistent is that the VC dimension of the function set is limited, and the convergence speed is the fastest.
Personal understanding, if a VC dimension infinity, that the function set can break up a collection containing any element. Then this function must be very complex to meet this condition, if a function is too complex, the generalization ability of this function will decrease, the training experience risk will increase, the convergence speed will also slow down.
SVM, risk minimization of experience, VC dimension