LIBSVM parameter Selection

Source: Internet
Author: User
Tags svm rbf kernel

LIBSVM parameter selection [predictive label, accuracy rate, decision value]=svmpredict (test label, test data, training model);

Original reference: http://blog.csdn.net/carson2005/article/details/6539192

A summary of the selection of SVM parameters C&g [matlab-libsvm]:http://www.ilovematlab.cn/thread-47819-1-1.html The original text see below

It is to be recalled that LIBSVM supports multi-class classification problems, and when there is a k to classify problem, LIBSVM constructs k* (k-1)/2 classification models to classify, namely: LIBSVM uses a one-to-one approach to build a multi-class classifier, as follows:

1 vs 2, 1 vs 3, ..., 1 vs K, 2 vs 3, ..., 2 vs K, ..., k-1 vs K.

-S SVM type default is 0 c-svc

C_svc and Nu_svc actually adopt the same model, but their parameter C range is different, C_svc uses 0 to positive infinity, nu_svc is [0,1].

-G: Sets the g in the kernel function, the default value is 1/k, where k in the-G option refers to the number of attributes in the input data.

The key of SVM is the type of kernel function, mainly linear kernel, polynomial kernel, radial base core (RBF) and sigmoid kernel.
The most widely used in these functions should be the RBF kernel , whether it is a small sample or a large sample, high dimensional or low-dimensional, and so on, the RBF kernel function is applicable, it has some advantages over other functions:
1) The RBF kernel function can map a sample to a higher dimensional space, and the linear kernel function is a special case of RBF, that is, if you consider using RBF, then there is no need to consider the linear kernel function.
2) compared with the polynomial kernel function, the RBF need to determine less parameters, and the number of kernel function parameters directly affect the complexity of the function. In addition, when the order of the polynomial is higher, the element value of the kernel matrix will tend to infinity or infinity, while the RBF is on, which will reduce the computational difficulty of the numerical value.
3) For some parameters, the RBF and sigmoid have similar performance.

Why to choose RBF

In general, the RBF kernel is a reasonable first choice. This kernel function maps a sample to a higher dimensional space, unlike a linear nucleus, which can handle nonlinear relationships between categorical labels and attributes. Also, the linear nucleus is a special case of the RBF (Keerthi and Lin 2003), so a linear nucleus with a penalty factor of C has the same performance as some parameters (c,γ) of the RBF nucleus. At the same time, the sigmoid nucleus behaves like a certain parameter of the RBF kernel (Lin and Link 2003).
The second reason, the number of hyper-parameters (Hyperparameter) affects the complexity of the model selection (because the parameters can only be tested!). )。 Polynomial nuclei have more hyper-parameters than RBF nuclei.
Finally, the RBF nuclei have less numerical complexity (numerical difficulties). A key point 0<kij<=1 compared to a polynomial core, which requires infinity (rxitxj+r>1) or zero (rxitxj+r<1), which is a high-order operation. Furthermore, we must point out that the sigmoid nucleus is not legal under certain parameters (for example, it is not an inner product of two vectors). (Vapnik 1995)

Of course, there are some situations where the RBF kernel is not applicable. In particular, when the feature dimension is very large, it is possible that only linear cores can be applied.

    1. In some cases, the RBF kernel is inappropriate. In fact, when the number of features is very large, most of us will consider using linear kernel.

Nr_weight, Weight_label, and weight These three parameters are used to change the penalty factor for some classes. When the input data is unbalanced, or the risk cost of mis-categorization is asymmetric, these three parameters will play a very important role in the training of the sample.

Nr_weight are the number of elements of weight_label and weight, or they are called dimensions. Weight[i] and Weight_label[i] are one by one corresponding, weight[i] the coefficient of the penalty factor representing the category Weight_label[i] is weight[i]. If you do not want to set a penalty factor, simply set Nr_weight to 0.

To prevent incorrect parameter settings, you can also invoke the interface function Svm_check_parameter () provided by LIBSVM to check the input parameters.

http://blog.csdn.net/heyijia0327/article/details/38090229Radial basis function (Radial basis function) is a kind of function, the radial basis function is a function of its value (y) only depends on the variable (x) distance from the origin, that is, or it can be a distance from some other center point, that is, basis function. referenced from wiki . That is, the radial basis function can be selected as a kernel function, such as SVM, the Gaussian path basis function is generally used as the kernel function, but the kernel function does not have to choose the radial basis of this kind of function.

The next step is to discuss why the kernel function can be mapped to a high-dimensional space, and why the radial basis can be mapped to an infinite dimensional space

Http://www.cnblogs.com/LeftNotEasy/archive/2011/05/02/basic-of-svm.html Overall speaking SVM, easy to understand!

The penalty factor in the equation 1 note the position of C, you can also recall the role of C (to characterize how much you value outliers, c the bigger the more attention, the more you don't want to lose them). This formula is written by the people who used to do SVM, everyone is so used, but there is no rule that all the relaxation variables must use the same penalty factor, we can give each outlier point to use a different C, this means that you do not pay attention to each sample, some samples lost also lost, Wrong is wrong, these give a relatively small c, and some samples are important, must not classify errors, give a very large c.
One way to deal with data set skew problem is to make the article in the penalty factor, presumably everyone also guessed, that is to give the sample number of negative class more large penalty factor, that we attach importance to this part of the sample (the original quantity is less, and then discard some, that family negative class still live), So the part of our objective function that is lost due to the relaxation variable becomes:

I=1...P are all positive samples, and j=p+1...p+q are negative samples. LIBSVM This algorithm package is used to solve the skew problem in this way.

How is c+ and c sure? Their size is tried out (parameter tuning), but their proportions can be determined in some way. Let's assume that c+ is 5 so large, and that a very straightforward way to determine C is to use the ratio of two classes of samples, corresponding to the example just cited, C-can be set to 500 so large (because 10,000:100=100:1).

But this is not good enough, back to see just the diagram, you will find that the reason is the class can be "bullying" negative class, in fact, not because the negative sample less, the real reason is the negative sample distribution is not broad enough (not extended to the negative category should have the region). Say a specific point of the example, now want to give political and sports articles to classify, political articles a lot, and sports class only provide a few articles about basketball, then the classification will be obviously biased in political category, if you want to add a sample of sports articles, but the increase of the sample is still all about basketball (that is, no football, Volleyball, racing, swimming, etc.), what will happen? Although the number of sports articles can be as much as the political class, but too concentrated, the results will still favor the political class! So a better way to c+ and C-determine ratios should be to measure how well they are distributed. For example, you can calculate how much they occupy in space, such as to find a super-ball in the negative class-that is, the ball in the high-dimensional space-it can contain all the negative class of the sample, and then find a positive class, the radius of two ball, you can roughly determine the distribution situation. Obviously the distribution of large radius is relatively broad, giving a smaller penalty factor.

But this is not good enough, because some categories of samples are really concentrated, this is not the question of the number of samples provided, this is the characteristics of the category itself (that is, some of the topic is very narrow, such as computer-related articles are obviously less than the culture of the article so "unrestrained"), this time even if the radius of the ball is very Nor should there be a penalty factor of two different categories.

http://blog.csdn.net/zhzhl202/article/details/7583464 references and Further reading: 1. Http://www.blogjava.net/zhenandaci/archive/2009/03/15/259786.html explains the reasons and benefits of the introduction of relaxation variables in SVM 2. The characteristics of kernel functions in SVM and the role of http://www.blogjava.net/zhenandaci/archive/2009/03/06/258288.html in SVM 3. Http://blog.pluskid.org/?tag=support-vector-machine Kid's classic article on SVM
  • http://blog.csdn.net/liulina603/article/details/8552424
  • Svm_type

    Specify the type of SVM, and the following are possible values:

    • cvsvm::c_svc Class C support vector classification machine. Class n Grouping (n 2), which allows for incomplete classification with the exception value penalty factor C.
    • cvsvm::nu_svc class support vector classification machine. n a classifier with similar incomplete classification. The parameter is substituted for C (its value is in the interval "0,1", the greater the NU, the smoother the decision boundary).
    • Cvsvm::one_class Single classifier, all training data extracted from the same class, and then SVM established a dividing line to divide the class in the feature space occupied by the region and other classes in the feature space occupied area.
    • cvsvm::eps_svr class support vector regression machine. The distance between the eigenvector of the training set and the fitted super-plane needs to be less than p. The exception value penalty factor C is adopted.
    • cvsvm::nu_svr class support vector regression machine. Instead of P.
  • Kernel_type

    The kernel type of the SVM, here is the possible value:

    • cvsvm::linear Linear core. Without any mapping to a high-dimensional space, linear differentiation (or regression) is done in the original feature space, which is the quickest choice.
    • CVSVM::P oly polynomial kernel:.
    • CVSVM::RBF Radial-based functions are a good choice for most situations:.
    • cvsvm::sigmoid sigmoid function kernel:.
  • degree – the parameter degree of the kernel function (POLY).
  • Gamma – The parameters of the kernel function (poly/rbf/sigmoid).
  • COEF0 – the parameter coef0of the kernel function (poly/sigmoid).
  • cvalue The parameter Cof the –SVM type (C_SVC/EPS_SVR/NU_SVR).
  • The parameters of the nu –svm type (NU_SVC/ONE_CLASS/NU_SVR).
  • The parameters of the p –svm type (EPS_SVR).
  • class_weights The optional weights in the –c_svc, assigned to the specified class, multiplied by C to become. So these weights affect different categories of error classification penalties. The greater the weight, the greater the penalty for some category of mis-classified data.
  • Term_crit The abort condition of the iterative training process of –SVM, and solves some constrained two suboptimal problems. You can specify the tolerance and/or maximum number of iterations.
  a summary of the selection of SVM parameters c&g [MATLAB-LIBSVM]Wrote a program to select the best values for the parameters C and G in the SVM.
[The purpose of this is to facilitate the use of this small program directly to find the best values of C and G, no need to write something else.]
In fact, originally LIBSVM C language version has the corresponding subroutine can find the best C and G, need to load the Python language and then use the PY drawing to find the best C and G, I wrote a matlab version of the. It makes up for LIBSVM in the MATLAB version of the vacancy.

The test data is also the wine in my video.

The idea of finding the best C and G is still to let C and G run in a certain range (e.g. C = 2^ ( -5), 2^ ( -4),..., 2^ (5), G = 2^ ( -5), 2^ ( -4),..., 2^ (5)), and then cross Validation the idea of finding the highest accuracy of C and G, where I made a little change (purely personal little experience and ideas), I improved: because there would be different C and G that corresponded to the highest accuracy rate, I put the Group C and G with the smallest C as the best C and G, Because the penalty parameters can not be set too high, very high penalty parameters to make the accuracy of the validation data increase, but the high penalty parameter C will cause the learning state, anyway, from my use of SVM to now, often is the penalty parameter C too high will lead to the final Test set accuracy is not very ideal.

In the use of this program is also a small trick, you can first a wide range of rough to find the more ideal C and g, and then narrow the range to find more ideal C and G.

For example First let C = 2^ ( -5), 2^ ( -4),..., 2^ (5), G = 2^ ( -5), 2^ ( -4),..., 2^ (5) Find the ideal C and G in this range:
======
<ignore_js_op>
======
At this point BESTC = 0.5,BESTG=1,BESTACC = accuracy rate of 98.8764[cross validation]

Accuracy rate of final Test set accuracy = 96.6292% (86/89) (classification)
======
At this point you can reduce the range of C and G. Also the size of the step can be reduced (the program has parameters can be self-adjusting, there are default values do not adjust).
Let C = 2^ ( -2), 2^ ( -1.5),..., 2^ (4), G = 2^ ( -4), 2^ ( -3.5),..., 2^ (4) Find more ideal C and G in this range:
=============
<ignore_js_op>
===============
At this point BESTC = 0.3536,BESTG=0.7017,BESTACC = accuracy rate of 98.8764[cross validation]
Accuracy rate of final Test set accuracy = 96.6292% (86/89) (classification)
===================
The code for the second Test above:
  1. Load WINE_SVM;
  2. Train_wine = [wine (:), wine (60:95,:); wine (131:153,:)];
  3. Train_wine_labels = [Wine_labels]; Wine_labels (60:95); Wine_labels (131:153)];
  4. Test_wine = [Wine (31:59,:); wine (96:130,:); wine (154:178,:)];
  5. Test_wine_labels = [Wine_labels (31:59); Wine_labels (96:130); Wine_labels (154:178)];
  6. [Train_wine,pstrain] = Mapminmax (Train_wine ');
  7. pstrain.ymin = 0;
  8. Pstrain.ymax = 1;
  9. [Train_wine,pstrain] = Mapminmax (Train_wine,pstrain);
  10. [Test_wine,pstest] = Mapminmax (Test_wine ');
  11. pstest.ymin = 0;
  12. Pstest.ymax = 1;
  13. [Test_wine,pstest] = Mapminmax (test_wine,pstest);
  14. Train_wine = Train_wine ';
  15. Test_wine = Test_wine ';
  16. [BESTACC,BESTC,BESTG] = SVMCG (train_wine_labels,train_wine,-2,4,-4,4,3,0.5,0.5,0.9);
  17. cmd = ['-C ', Num2str (BESTC), '-G ', Num2str (BESTG)];
  18. Model = Svmtrain (train_wine_labels,train_wine,cmd);
  19. [PRE,ACC] = Svmpredict (Test_wine_labels,test_wine,model);
Copy code ============ I wrote the best value for choosing the Parameters C and G in the SVM. The code of the program svmcg.m====================
  1. function [BESTACC,BESTC,BESTG] = svmcg (train_label,train,cmin,cmax,gmin,gmax,v,cstep,gstep,accstep)
  2. %SVMCG Cross Validation by Faruto
  3. %email:[email protected] qq:516667408 Http://blog.sina.com.cn/faruto BNU
  4. %last Modified 2009.8.23
  5. %super Moderator @ www.ilovematlab.cn
  6. Percent of the parameters of SVMCG
  7. If Nargin < 10
  8. Accstep = 1.5;
  9. End
  10. If Nargin < 8
  11. Accstep = 1.5;
  12. Cstep = 1;
  13. Gstep = 1;
  14. End
  15. If Nargin < 7
  16. Accstep = 1.5;
  17. v = 3;
  18. Cstep = 1;
  19. Gstep = 1;
  20. End
  21. If Nargin < 6
  22. Accstep = 1.5;
  23. v = 3;
  24. Cstep = 1;
  25. Gstep = 1;
  26. Gmax = 5;
  27. End
  28. If Nargin < 5
  29. Accstep = 1.5;
  30. v = 3;
  31. Cstep = 1;
  32. Gstep = 1;
  33. Gmax = 5;
  34. Gmin =-5;
  35. End
  36. If Nargin < 4
  37. Accstep = 1.5;
  38. v = 3;
  39. Cstep = 1;
  40. Gstep = 1;
  41. Gmax = 5;
  42. Gmin =-5;
  43. Cmax = 5;
  44. End
  45. If Nargin < 3
  46. Accstep = 1.5;
  47. v = 3;
  48. Cstep = 1;
  49. Gstep = 1;
  50. Gmax = 5;
  51. Gmin =-5;
  52. Cmax = 5;
  53. Cmin =-5;
  54. End
  55. X:c y:g CG:ACC
  56. [x, Y] = Meshgrid (Cmin:cstep:cmax,gmin:gstep:gmax);
  57. [M,n] = size (X);
  58. CG = zeros (m,n);
  59. Percent record ACC with different C & G,and find the BESTACC with the smallest C
  60. BESTC = 0;
  61. BESTG = 0;
  62. BESTACC = 0;
  63. Basenum = 2;
  64. For i = 1:m
  65. for j = 1:n
  66. cmd = ['-V ', Num2str (v), '-C ', Num2str (Basenum^x (i,j)), '-G ', Num2str (Basenum^y (I,J))];
  67. CG (I,J) = Svmtrain (Train_label, train, cmd);
  68. If CG (I,J) > BESTACC
  69. BESTACC = CG (I,J);
  70. BESTC = Basenum^x (i,j);
  71. BESTG = Basenum^y (i,j);
  72. End
  73. if (CG (i,j) = = Bestacc && bestc > Basenum^x (i,j))
  74. BESTACC = CG (I,J);
  75. BESTC = Basenum^x (i,j);
  76. BESTG = Basenum^y (i,j);
  77. End
  78. End
  79. End
  80. Percent to draw the ACC with different C & G
  81. [C,h] = contour (x,y,cg,60:accstep:100);
  82. Clabel (c,h, ' FontSize ', ten, ' Color ', ' R ');
  83. Xlabel (' log2c ', ' FontSize ', 10);
  84. Ylabel (' log2g ', ' FontSize ', 10);
  85. Grid on;
Copy Code =====================================


So that Libsvm-matlab Toolbox I have an upgrade version of my own. You can add this svmcg.m to the ...
<ignore_js_op>libsvm-mat-2.89-3[faruto version].rar (76.46 KB, download number: 1853)
There are SVMCG.M instructions for use here:
[BESTACC,BESTC,BESTG] = svmcg (train_label,train,cmin,cmax,gmin,gmax,v,cstep,gstep,accstep)

Train_label: Training set label. Requirements are consistent with requirements in the LIBSVM Toolbox.
Train: Training set. Requirements are consistent with requirements in the LIBSVM Toolbox.
Cmin: The minimum value of the variation range of the penalty parameter C (after taking the logarithm of base 2), i.e. C_min = 2^ (cmin). Default is-5
Cmax: The maximum value of the variation range of the penalty parameter C (after taking the logarithm of base 2), i.e. C_max = 2^ (Cmax). Default is 5
Gmin: The minimum value of the variation range of the parameter G (after the base 2 logarithm), i.e. g_min = 2^ (gmin). Default is-5
Gmax: The minimum value of the variation range of the parameter G (after the base 2 logarithm), i.e. g_min = 2^ (gmax). Default is 5

V:cross validation parameters, that is, the test set is divided into several parts of cross validation. Default is 3
Cstep: The size of the parameter C step. Default is 1
Gstep: The size of the parameter G step. Default is 1
Accstep: The step size at which the accuracy graph is displayed. Default is 1.5
[These parameters can be changed to achieve best results, or you can not use default values]
<ignore_js_op>

LIBSVM parameter Selection

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.