In a recent image recognition project, we need to rate the classification results. Because libsvm3.12 is used, we decided to directly return the dec_values of the svm_predict_values function as the score at the beginning, after studying it, I thought it was quite interesting.
First, we will introduce the multiclass classification method in SVM. Currently, one-against-all and one-against-one are popular SVM multiclass classification strategies. libsvm adopts one-against-one, other open-source libraries such as SVM-light use one-against-all. For N-class data, one-against-All establishes n classifiers, but has some disadvantages, such as dataset skew, overlapping classification, non-segmentation phenomenon, etc. (see: http://www.blogjava.net/zhenandaci/archive/2009/03/26/262113.html ); one-against-one creates N * (N-1)/2 classifiers, with each classifier voting on the sample, the category with the most votes matched as the recognition result. The basis for determining who to vote in the classifier is demo-
Value, that is, dec_values in libsvm.
Consider the linear kernel below.
Intuitively, demo-value is the distance from the sample point to the optimal superplane. Actually, it is not. The analysis is as follows:
The classification decision surface calculated by the false design can completely separate the two classes (p, n), where W is interpreted as the normal vector of the optimal superplane, and the formula is, α J corresponds to sv_coef of the svm_model struct in libsvm.
The classification interval is. Therefore, the distance from the support vector to the optimal superplane is. That is to say, for other sample points, demo-
If the value is, the distance to the optimal superplane is, and the support vector is. Only deploy in libsvm
Value calculation, not removed | ω |, to further calculate the hyperplane distance, refer to: http://www.csie.ntu.edu.tw /~ Cjlin/libsvm/faq.html # f1_1.
In addition, demo-value is also irrelevant to probability (it indicates probability in regression, but not in classification ). According to what I have seen, the value range of the demo-value formula should be in the real number field (this is what I saw in a foreign master's thesis, see active learning with Support Vector Machines in Andreas vlachos, 2004 ). However, I think this is not true. If we scale the sample feature data, all the sample points will be limited to a limited space, since the optimal hyperplane is also determined in this space, as long as all the test sample points are scaled, there must be an upper limit on the distance to the optimal superplane. How to deduce the specific upper bound is unknown for the moment.
One small problem in the one-against-one strategy for multiclass classification is that the demo-value calculated by each binary classifier is actually incomparable, one-against-all does not have this problem. Why? This is because: 1) the training parameters of one-against-All classifiers are the same as those of the dataset, except that the label is changed. Therefore, it is valid to compare the demo-value of different classifiers. 2) different datasets are used for training each one-against-one classifier. Therefore, demo-value between classifiers cannot be directly compared. According to my understanding, the former is compared in the same space, and the latter is not.
According to others' opinions on domestic and foreign forums, demo-value cannot even be attached with any meaning, except that the distance can be further calculated using demo-value in linear kernel SVM (see http://stackoverflow.com/questions/11030253/decision-values-in-libsvm, but I think foreigners are not always right ). Demo-
Value can be used to evaluate the confidence level of classification results.
Back to the classification of the results of the problem of scoring, online search for information, see someone has done a similar job, see: http://blog.csdn.net/zhzhl202/article/details/7438313. The method is as follows: k indicates the number of all supported classes, N indicates the number of all classes, and Si
For all scores that support discriminative classes. I think this method takes into account both the voting number and demo-value information, which is better than simply accumulating dec_values and then obtaining the average value. However, there seems to be no solid theoretical basis for the origin of the formula. But before there are no other good methods, I used this for the time being in the project.