For support vector machines, it is a class two classifier, but for multiple classifications, SVM can also be implemented. The main method is to train more than two types of classifiers.

** One, multiple classification **

1, a pair of all (One-versus-all OVA)

Given m classes, you need to train M two class classifier. Where the classifier I is to set the class I data to Class 1 (positive Class), all other classes other than the M-1 I class are set to Class 2 (negative) so that each class needs to be trained with a class two classifier, and finally, we have M classifiers altogether. For a data x that needs to be sorted, voting is used to determine the category of X. For example, classifier I predicts data x, and if a positive class result is obtained, the result of classifying X with classifier I is that x belongs to Category I, then Class I gets a vote. If a negative class result is obtained, which means that x belongs to a class other than Class I, then every class except I gets a vote. The class with the most votes in the last count will be the class attribute of X.

2, all given m classes for all (All-versus-all AVA)

, a classifier is trained for every two classes in the M class, and a total of two class classifiers are M (m-1)/2. For example, there are three classes, 1,2,3, then you need three classifiers, They are for: 1 and 2 classes, 1 and 3 classes, 2 and 3. For a data x that needs to be sorted, it needs to be predicted by all classifiers, as well as voting to determine the final class attribute of X. However, this method needs more classifiers than the "pair of all" methods, and because there may be multiple class votes in the classification prediction, the data x belongs to multiple categories, which affects the classification accuracy.

for multiple classification in the implementation of MATLAB, MATLAB with the SVM classification function can only use function to achieve two classification, multiple classification problem can not be directly solved, need to be based on the above mentioned multiple classification methods, to achieve their own. Although the function of MATLAB does not directly solve the problem of polyphenols, we can apply the LIBSVM toolkit. The LIBSVM Toolkit uses the second "many-to-many" approach to directly implement multiple classifications, solving classification problems (including C-SVC, n-svc), regression problems (including E-SVR, N-SVR), and distribution estimates (ONE-CLASS-SVM), and provides linear, polynomial, The radial basis and the S-shape function are four kinds of common kernel functions to choose from.

**second, using LIBSVM in MATLAB to achieve multiple classification (training function svmtrain+ Predictive function svmpredict)**

For the function of the training model in LIBSVM**Svmtrain**, model = Svmtrain (Training data category, training data, ' a series of parameters '), where the parameter form such as '-C 2-g 0.02 ' in this, the parameters mainly include the following:

(1)-s--the type of SVM it represents, including the above mentioned (default value is 0):

0--c-svc

1--v-svc

2--a class of SVM

3--e-svr

4--v-svr

(2)-t--It represents the kernel function type (the default value is 2)

0– linear: U ' V

A/V polynomial: (r*u ' v. + COEF0) ^degree

2–RBF function: exp (-gamma|u-v|^2)

3–sigmoid:tanh (r*u ' v + coef0)

(3) Parameter setting of kernel function

Degree setting in-d--kernel function (for polynomial kernel function) (default 3)

Gamma function setting in-g--kernel function (for polynomial/rbf/sigmoid kernel function) (default 1/k,k is number of eigenvalues)

COEF0 setting in-r--kernel function (for polynomial/sigmoid kernel function) (default 0)

-c--set C-SVC,E-SVR and V-svr parameters (loss function) (default 1)

-n--set NU-SVC, a class of SVM and Nu-svr parameters (default 0.5)

-p--set the value of the loss function p in E-svr (default 0.1)

-m--sets the cache memory size in megabytes (default 40)

-e--set allowable termination criteria (default 0.001)

-h--whether to use heuristics, 0 or 1 (default 1)

-b--whether to train an SVC or SVR model, 0 or 1 (default 0)

-wi--setting parameter C of Class I is weight*c (c in C-svc) (default 1)

-v--n-fold Interactive test mode, n is the number of fold, must be greater than or equal to 2 (in training using the-v parameter cross-validation, the return is not a model, but the cross-validation of the classification of the correct rate or the RMS error of the regression)

After using the function Svmtrain training classification model, a struct body is returned, which includes data:

(1) parameters (an array of 5*1)

First element:-S,SVM type (int defaults to 0)

Second element:-T, The kernel function type (default is 2)

The third element:-D, the degree setting in the kernel function (for the polynomial kernel function) (default 3),

Fourth element:-G, the R (Gamma) function setting in the kernel function (for the polynomial/rbf/sigmoid kernel function) ( The default is the reciprocal of the number of categories;

Element Fifth: The COEF0 setting in the-R kernel function (for the polynomial/sigmoid kernel function) ((default 0)

(2)-nr_class: Indicates how many classes (int)

(3)-TOTALSV in the dataset: Represents the total number of support vectors. (int)

(4)-rho: the opposite number of constant offsets in the decision function wx+b (-B).

(5)-label: A label that represents a category in a DataSet

(6) ProbA: A numeric value used for probability estimates when using the-b parameter, otherwise null.

Probb: The value used for probability estimates when using the-b parameter, otherwise null.

(7)-nsv: Represents the number of support vectors for each class of samples, corresponding to the label's category labels.

(8)-sv_coef: Represents the coefficients of each support vector in a decision function.

(9)-svs: represents all support vectors, if the feature is n-dimensional, the support vector has a total of M, then the sparse matrix of M x N.

(two) Nu: Display of-N parameters

Iter: Iteration number

(a) obj: Minimum value for solver for SVM file conversions

The

** svmpredict ** function classifies and calculates the classification precision based on the test data class properties, the test data, and the model obtained by the Svmtrain function, [Predict_label, accuracy, dec_ Values]=svmpredict (Test data class properties, test data, classification model), the return value is described as follows:

(1) Predicted_label: Stores the class attributes corresponding to the sample after the classification.

(2) Accuracy: An array of 3 * 1, in order: the correct rate of classification, the RMS error of regression, and the square correlation coefficient of regression.

(3) Decision_values/prob_estimates: is an array of probabilities, for the case of a M-Data, N-class, if the "-B-1" argument is specified (using the matrix of n x k, each row represents the probability of the sample falling into each category respectively). If the "-B 1" parameter is not specified, then the matrix of N * NX (N-1)/2, each row represents the predicted results of N (n-1)/2 two classifier SVM.

**third, the use of LIBSVM to classify the steps**

(1) Normalization of data (simple scaling)

For simple scaling (scaling) of data, the most important advantage of scaling is the ability to avoid attributes in large numerical intervals that dominate the properties of small numerical intervals. Another advantage is that it avoids the numerical complexity of the calculation process. (In the experiment, it is found that the normalized process, respectively, the training data and test data for normalized treatment, compared to the training data and test data as a whole normalized processing to obtain a higher classification accuracy)

(2) Application of RBF kernel

(3) Choose the best C and G

C is a penalty factor, a number that needs to be set up before the training model, it means that we value the outlier data in the class, the greater the value of C, the more we value, the more we do not want to lose these outliers; G is the gamma function setting in the kernel function (for polynomial/rbf/sigmoid Kernel function) (default 1/k,k is the number of eigenvalues). The choice of C and G has a great effect on the classification accuracy, in this experiment, the choice of C and G is accomplished by using function Svmcgforclass.

(4) using the obtained optimal C and G training classification model

Using the Svmtrain function in LIBSVM to train multiple classification models

(5) test

Use the Svmpredict function in LIBSVM to test the classification accuracy

**Four, the experiment**

Using the data provided by LIBSVM to carry out multiple classification experiments, download the wine data, is about the type of wine, download the data includes three parts, respectively, the class number of 3,178 wine data 13 attributes list, 178 wine data corresponding class attribute list. The code is as follows:

function [classfication] = Test (train,test) load chapter12_wine.mat% Download Data train=[wine (1:30,:); Wine (60:95,:); wine (131:153,:)];% selection of training data train_group=[wine_labels (1:30); Wine_labels (60:95); Wine_labels (131:153)]; % Select the training Data category Identification Test=[wine (31:59,:); wine (96:130,:); wine (154:178,:)];% Select test data test_group=[wine_labels (31:59); wine_ Labels (96:130); Wine_labels (154:178)];
% Select the test data category identify% data preprocessing, with the mapminmax of MATLAB with the training set and test set normalized processing [0,1] between the training data processing [Train,pstrain] = Mapminmax (train ');
% The range parameters of the mapping function are respectively 0 and 1 pstrain.ymin = 0;
Pstrain.ymax = 1;
% to the training set [0,1] normalized [train,pstrain] = Mapminmax (Train,pstrain);
% test data processing [test,pstest] = Mapminmax (test ');
% The range parameters of the mapping function are respectively 0 and 1 pstest.ymin = 0;
Pstest.ymax = 1;
% [0,1] normalized [test,pstest] = Mapminmax (test,pstest) to the test set;
% the training set and test set are transpose to conform to the LIBSVM Toolbox data format requirements train = Train ';
Test = Test '; % search for optimal C and g% rough selection: C&g range is 2^ ( -10), 2^ ( -9),..., 2^ (a) [BESTACC,BESTC,BESTG] = Svmcgforclass (train_group,train,-
10,10,-10,10); % fine Selection: C's range is 2^ ( -2), 2^ ( -1.5),..., 2^ (4), G Variation fanWai is 2^ ( -4), 2^ ( -3.5),..., 2^ (4) [BESTACC,BESTC,BESTG] = Svmcgforclass (train_group,train,-2,4,-4,4,3,0.5,0.5,0.9);
% Training Model cmd = [' C ', Num2str (BESTC), ' G ', Num2str (BESTG)];
Model=svmtrain (Train_group,train,cmd);
Disp (CMD);
% test classification [Predict_label, Accuracy, dec_values]=svmpredict (Test_group,test,model);
% Print test classification results figure;
Hold on;
Plot (Test_group, ' o ');
Plot (Predict_label, ' r* ');
Legend (' actual test set classification ', ' Predictive Test set classification ');
Title (' The actual classification and prediction of the test set ', ' fontsize ', 10);
End

**v. Results of the experiment**

**Vi. Issues**

Doubt (1) ...............!。。。

The above experiment is written in reference to the example in LIBSVM, start with the example idea, but in the normalization, oneself is the training data and test data normalized together, the accuracy is only 61%, then changed to the example in the training data and test data normalization, precision is now 88%, But do not understand the cause of this difference, want to know the reason of the friend advice (code as follows (1) and (2))

1, training data and test data normalized to [0,1] code (obtain classification accuracy of 88%):

% Training Data processing
[Train,pstrain] = Mapminmax (train ');
% The range parameters of the mapping function are respectively 0 and 1
pstrain.ymin = 0;
Pstrain.ymax = 1;
% to the training set [0,1] normalized
[Train,pstrain] = Mapminmax (train,pstrain);
% test Data processing
[test,pstest] = Mapminmax (test ');
% The range parameters of the mapping function are respectively 0 and 1
pstest.ymin = 0;
Pstest.ymax = 1;
% [0,1] normalized
[test,pstest] = Mapminmax (test,pstest) to the test set;
% the training set and test set are transpose to conform to the LIBSVM Toolbox data format requirements
train = Train ';
Test = Test ';

2, training data and test data for the whole normalization to [0,1] code (obtain classification accuracy 61%):

[Mtrain,ntrain]=size (train);
[Mtest,ntest]=size (test);
Dataset=[train;test];
[Dataset_scale,ps]=mapminmax (DataSet ', 0,1);
Dataset_scale=dataset_scale ';
Train=dataset_scale (1:mtrain,:);
Test=dataset_scale ((mtrain+1):(mtrain+mtest),:);

Doubt (2) ...!。。。。。。!。。。!。。。。...

And then again, for C and G these two values have functions that can be tuned automatically, but in normalization, if I put the training data and test data normalized to the range from [0,1] to [0,4], the classification accuracy obtained from 88% to 96%, the scope of this normalization is how to determine it.

The training data and the test data are normalized to [0,4] code (for classification accuracy 96%):

% Training Data processing
[Train,pstrain] = Mapminmax (train ');
% The range parameters of the mapping function are respectively 0 and 4
pstrain.ymin = 0;
Pstrain.ymax = 4;
% to the training set [0,4] normalized
[Train,pstrain] = Mapminmax (train,pstrain);
% test Data processing
[test,pstest] = Mapminmax (test ');
% The range parameters of the mapping function are respectively 0 and 4
pstest.ymin = 0;
Pstest.ymax = 4;
% [0,4] normalized
[test,pstest] = Mapminmax (test,pstest) to the test set;
% the training set and test set are transpose to conform to the LIBSVM Toolbox data format requirements
train = Train ';
Test = Test ';