A nonlinear multi-classification experiment of BP, SVM and adaboost for supervising algorithm

Source: Internet
Author: User
Tags comparison svm
before you write:

Some of the previous articles, such as the decision tree, Bayesian algorithm, and other simple algorithms to the Neural Network (BP), Support vector Machine (SVM), AdaBoost and other more sophisticated machine learning algorithms (to which interested friends can forward blog look), various algorithms have advantages and disadvantages, Basically can deal with linear and non-linear sample sets, but concept viewing Mencius these algorithms, the personal feeling for the data (whether linear or nonlinear) of the classification, the better when the number of BP, SVM, AdaBoost meta-algorithm of the three species, Since the previous introduction of the corresponding algorithm principle and the use of the sample and the classification of the situation are two classification, for multi-classification of the situation has not been involved, and the actual situation is often classified multi-classification Data sample More, this section is aimed at the BP, SVM, AdaBoost these three well-felt algorithms perform a comparison and experiment with a simple, non-linear, multi-class sample . One: Understanding and Analysis

Since it is a multi-classification sample, first of all to understand the sample, so-called multi-classification is more than 2 samples in the sample set, at least 3 classes are called multi-classification. For example, the following is a two-dimensional non-linear multi-class sample set (This is also the sample set of our experiment later):

Each color represents a class, you can see a total of 5 classes, but also can see is a non-linear bar, here can be set to the five classes of the class label.

Well, once in a single algorithm introduction, the inside of the experiment is two classification (that is, only two of the above 5 categories of samples), two classification is very simple, not you are my model, then from the two classification to multi-classification how to convert it. If a sample is not me, it may not be you, it may be him she is right, this time how to do.

Now the general way is to turn the multi-classification problem into two classification problem, because many of the previous algorithms in the principle deduction is assumed that the sample is two classification, like SVM, the entire derivation process and the conclusion are relative two classification, there is no consideration of multi-classification, in turn, you want to apply the SVM directly to the multi-classification is not possible, Unless you are considering multiple classifications from the principle, and then get a general formula, finally in the program to achieve this.

How many classification problems are turned into two classification problems. Very simple, a simple idea is to divide and prioritize, to take the voting mechanism. There are two ways to convert, because the classification problem ultimately requires training to produce a classifier, the classifier is based on training samples, the previous two classification problem is actually produced a classifier, and the multi-classification problem according to the training set is more than a classifier, but more than a classifier.

That the first way is to take a class in the training sample set, all other classes as another class, like the above 5 class, I think the middle class is the first category, and re-assign the class label 1, and the surrounding four classes are considered the second class, and re-give the class label dimension-1, OK now the question is whether it is the two classification problem. Yes. That two classification is good to do, with any previous algorithm processing can be. Well, this is a classifier created in the middle of the case as a class. Similarly, we can also be any one of the surrounding categories of self-made, and the other collectively referred to as a kind of AH. Yes, of course, and so on, we have created several classifiers. There are 5 classifiers created like the 5 above, so how do we divide the sample of the test set into which category? Note that the test set is supposed not to know the class label, then a test sample, I put it into the above 5 classifiers, to see what kind of end it belongs to, then it belongs to which category. For example, if a test sample is supposed to belong to the middle (assuming the 5th class bar), then enter the fifth class from the case, this time found that it belongs to the fifth class, record 5, and then enter the upper left corner (assuming 1 classes) from the case of a class, then found that the sample is not 1 class, but belongs to 2, 3,4,5 These kinds of classes merge together, then it belongs to who in 2,3,4,5. It's possible, so I'll take a look at this and remember 2,3,4,5. OK again to have the upper corner, at this time can also remember this sample input 1,3,4,5. And so on, and so on, the last of the 5 classifiers to go through, remember a lot of the label bar, and then to count their number, such as here Statistics 1 class, found that there were 3 times, 2,3,4 have appeared 3 times, 5 appeared 5 times, Then we have reason to think that this sample belongs to the fifth category, so now think about how to solve many kinds of problems. This process refers to a graph in the great God blog as follows:

can see, in fact, the black line This class is the ideal classification surface we want, and in this way the classification surface is a shaded part of the classification surface, What does it say in the shaded part? We think, suppose a sample fell in the shadow, such as I painted the purple point, according to the above calculation, found that it belongs to the Triangle Class 2 times, belongs to the Square class 2 times, belongs to the Circle Class 1 times, that time you how to do. No recruit, only in the largest two times in the pick one, good luck to think that belong to the triangle, pick the right, bad luck picked a square, divided the wrong. So the shadow part is ambiguous, this time can only pick one of them.

This is the first way, there is a second classification , thinking similar, but also into the two classification problem, but the implementation of the difference. We are the first to pick a class, the rest of the self into a class, and here, but also to pick a class from a class, but the rest is not self-formed, but in the pick a class from a class, that is, from the training sample to pick two of the categories to produce a classifier. Like the above Class 5, I first picked out 1, 2, class training samples, training a category of 1, 2, class, and then 1, 3, singled out training a classifier, then 1,4 again 1,5 2, 3, and so on (note 2,1 with the same, so omitted), So how many classifiers do you need to set up for 5 types of samples? N (n-1)/2, here is the 5*4/2=10 classifier, you can see 5 more than the above 5 classifiers. and the greater N, the more. Well, the rest of the problem is to set up the classifier, the remaining problems also take the voting mechanism, to a sample, take to the establishment of the discovery belongs to 1, belongs to the 1 type of accumulator Plus, brought to the 1,3 established by the discovery also belongs to 1, in addition, and so on. Finally, look at the 5-class accumulator which is the largest category. So a question comes, will it appear like the above situation, there are two or more accumulator value is the same. The answer is yes, but in this case, the probability of appearing the same is much less than the probability of the above situation (the comparison is the result of 10 classifiers, how much better than you 5), the same diagram is as follows:

You can see that the overlap is the middle of a small piece, compared to the way it is a lot smaller.
So fine comparison of these two ways, in fact, each has advantages and disadvantages. The first way is because fewer classifiers are established (n the larger the more obvious, the difference between the two (n (n-1)/2-n). That is, at the time of the operation is faster, and the second way, although slow, but high precision ah, and now the speed of the computer is fast enough to compensate for the disadvantage of the second way, so the individual is more inclined to the second way.
All right, finish the theory part, practice it, practice is the only way to test the truth . two: BP Pattern Recognition Toolbox processing Multi-classification experiment

First, using neural network algorithm to experiment, at the same time for speed and accuracy, we experiment matlab Neural Network Toolbox, about how to use the toolbox, please refer to:

machine learning practical matlab Neural Network Toolbox

In order to achieve better results, here we directly use the MATLAB set up under the BP Pattern Recognition Toolbox (Nprtool) . The use of the Toolbox can be directly manipulated through the GUI interface, or command operation, it is necessary to explain that the input form of the data, especially for the class label settings, in the toolbox, the class label is no longer a direct representation of the digital pass, but with a vector, such as Category 1 can be expressed as [ 1,0,0,0,0], Category 3 can be expressed as a representation of [0,0,1,0,0]. And if the sample input each row represents a sample, then the category is like the one above, and each row represents a sample category. If each column is a sample, then the corresponding label is a sample of each of the columns, and the following experiment each column represents a sample set of a sample:

Percent% * The classification design of MATLAB Pattern Recognition Toolbox% * Multi-class nonlinear classification%% of CLC clear close all% load data% * Data pre-processing of database = Load (' Data_test.mat ');
data = Data.data;
% of selected training samples Num_train = 200;% Total 500 samples% constructed random selection sequence choose = Randperm (Length (data)); Train_data = data (choose (1:num_train),:); label_temp = Train_data (:, end); label_train = zeros (Length (Train_data), 5);% 
Change the output classification label to the format required by the Toolbox for i = 1:length (train_data) label_train (I,label_temp (i)) = 1;
End train_data = Train_data (:, 1:end-1) ';
Label_train = Label_train '; % Test_data = data (choose (num_train+1:end),:); label_temp = Test_data (:, end); label_test = zeros (Length (Test_data), 5);% 
Change the output classification label to the format required by the Toolbox for i = 1:length (test_data) label_test (I,label_temp (i)) = 1;
End test_data = Test_data (:, 1:end-1) ';
Label_test = Label_test ';
Percent% Create a Pattern recognition Network hiddenlayersize = 10;
NET = patternnet (hiddenlayersize);
% The training set is divided into training set, validation set, test set net.divideParam.trainRatio = 70/100;
Net.divideParam.valRatio = 15/100;
Net.divideParam.testRatio = 15/100; % Train the Network [NET,TR] = train (Net,train_data,label_train);
% Test the Network predict = net (Test_data);
[~,predict] = max (predict);
Percent show the result--testings figure;
Gscatter (Test_data (1,:), Test_data (2,:), predict);
[~,label_test] = max (label_test);
Accuracy = Length (find (Predict==label_test))/length (Test_data); Title ([' Predict the testing data and the accuracy is: ', num2str (accuracy)]);

Can be seen, in fact, the program at the beginning of a lot of data training samples and test sample selection, while the class tag changes. In the end, a neural network system is established for pattern recognition, and finally the test set is tested using this network, and the results are as follows:

This is the network structure that comes out of the middle:

As you can see, this is the result of 300 test samples of 200 training samples, and the accuracy of the toolbox is quite high.
In fact, the Toolbox can also be directly manipulated through the GUI interface, do not have to write so many code, but your input data format and so on must be converted in advance. Direct command input Nprtool can open the GUI of the toolbox, detailed can be self-study. Three: SVM's LIBSVM processing multi-classification experiment

Let's use the SVM method to classify the above data. Since the above-mentioned BP part directly uses the Toolbox function, and does not involve two kinds of classification from two to multi-classification method, for SVM we will show both ways. Here I will use the LIBSVM Toolbox, see How to use the Toolbox:

Decryption SVM Series (v): Simple use of libsvm under Matlab

The first type:

Percent% * LIBSVM Toolbox Experiment% * Multi-class non-linear classification% percent CLC clear close all% Load data% * pre-processing-two kinds of cases of the information = load (' Data_test.mat ');
data = Data.data;
% Selection Training Sample number Num_train = 200;
% structured random selection sequence choose = randperm (data); Train_data = data (choose (1:num_train),:); Gscatter (Train_data (:, 1), Train_data (:, 2), Train_data (:, 3)); Label_train =
Train_data (:, end); Test_data = data (choose (num_train+1:end),:); label_test = Test_data (:, end); Construction and training of percent SVM for i = 1:5% 5 Class reclassified Lab
    El_temp = Label_train;
    Index1 = Find (Label_train = = i);
    Index2 = Find (Label_train ~= i);
    Label_temp (index1) = 1;
    Label_temp (INDEX2) =-1;
% training Model Model{i} = Svmtrain (Label_temp,train_data (:, 1:end-1), '-T 2 ');
End% uses the model to predict the classification of the test set predict = Zeros (length (Test_data), 1);
    For i = 1:length (test_data) data_test = Test_data (i,:);
    Addnum = zeros (1,5);
        for j = 1:5 temp = svmpredict (1,data_test (:, 1:end-1), model{j});
        If temp > 0 addnum (j) = Addnum (j) + 1;
            ElseAddnum = Addnum + 1;
        Addnum (j) = Addnum (j)-1;
End End [~,predict (i)] = max (addnum);
End percent show the result--testing figure;
Gscatter (Test_data (:, 1), Test_data (:, 2), predict);
Accuracy = Length (find (Predict==label_test))/length (Test_data); Title ([' Predict the training data and the accuracy is: ', num2str (accuracy)]);

The

results are as follows:

Under the second method:

Percent% * LIBSVM Toolbox Experiment% * Multi-class non-linear classification% percent CLC clear close all percent load data% * Data pre-processing for database = Load (' Data_test.mat ');
data = Data.data;
% Selection Training Sample number Num_train = 200;
% structured random selection sequence choose = randperm (data); Train_data = data (choose (1:num_train),:); Gscatter (Train_data (:, 1), Train_data (:, 2), Train_data (:, 3)); Label_train =
Train_data (:, end); Test_data = data (choose (num_train+1:end),:); label_test = Test_data (:, end); Construction and training of percent SVM num = 0; For i = 1:5-1 5 class fo
        R j = I+1:5 num = num + 1;
        % Reclassified index1 = Find (Label_train = = i);
        Index2 = Find (Label_train = = j);
        Label_temp = Zeros ((Length (index1) +length (INDEX2)), 1);
        %SVM need to set the classification label to 1 and -1 label_temp (1:length (index1)) = 1;
        Label_temp (Length (index1) +1:length (index1) +length (index2)) =-1;
        Train_temp = [Train_data (index1,:); Train_data (Index2,:)];      
    % training Model Model{num} = Svmtrain (Label_temp,train_temp (:, 1:end-1), '-T 2 '); End end% uses the model to predict the classification of the test set predict = Zeros (LengTh (Test_data), 1);
    For i = 1:length (test_data) data_test = Test_data (i,:);
    num = 0;
    Addnum = zeros (1,5);
            For j = 1:5-1 for k = j+1:5 num = num + 1;
            temp = Svmpredict (1,data_test (:, 1:end-1), model{num});
            If temp > 0 addnum (j) = Addnum (j) + 1;
            else Addnum (k) = Addnum (k) + 1;
End End End [~,predict (i)] = max (addnum);
End percent show the result--testing figure;
Gscatter (Test_data (:, 1), Test_data (:, 2), predict);
Accuracy = Length (find (Predict==label_test))/length (Test_data); Title ([' Predict the testing data and the accuracy is: ', num2str (accuracy)]);

The results are as follows:

What can be seen is that both of these methods are very good results, the accuracy is high, because the training samples randomly selected, each time the results will not be the same. As to which kind of good, I think, when the sample is large, when the speed satisfies the requirement, and the data may overlap, the second kind is better. four: AdaBoost meta-algorithm processing multi-classification experiment

The detailed principle and implementation process of AdaBoost meta-algorithm please see section:

the vernacular of machine learning and actual combat adaboost meta-algorithm

Considering the AdaBoost meta-algorithm does not go to the corresponding software toolbox, so here with their own functions to implement it, in the above blog involved in the following will be used in the two sub-functions buildsimplestump and adaboosttrainds, confined to space, No longer posted here, to use a friend can own the copy over there.

So, based on the above two sub-functions, we are writing two functions, one is the training function of AdaBoost, the other is the predictive function of adaboost, the function is as follows:
Training function:

function model = Adaboost_train (label,data,iter)
[Model.dim,model.direction,model.thresh,model.alpha] = ...
    adaboosttrainds (data,label,iter);
Model.iter = iter;

Predictive functions:

function predict = adaboost_predict (Data,model)
h = zeros (model.iter,1);
For j = 1:model.iter
    if model.direction (j) = =-1
        if data (Model.dim (j)) <= Model.thresh (j)
            h (j) =-1;
        else
            h (j) = 1;
        End
    ElseIf model.direction (j) = = 1
        if data (Model.dim (j)) <= Model.thresh (j)
            h (j) = 1;
        else
            h (j) =-1;
        End
    End
End
predict = sign (Model.alpha ' *h);

With these two functions we can experiment, here we only a second way of the multi-classification as an example, the function is similar to the above SVM, but the training model function there and the prediction function is changed to our here, the main function is as follows:

Percent% * adaboost% * Multi-class non-linear classification% percent of CLC clear close all% load data% * Data preprocessing = load (' Data_test.mat ');
data = Data.data;
% Selection Training Sample number Num_train = 200;
% structured random selection sequence choose = randperm (data); Train_data = data (choose (1:num_train),:); Gscatter (Train_data (:, 1), Train_data (:, 2), Train_data (:, 3)); Label_train =
Train_data (:, end); Test_data = data (choose (num_train+1:end),:); label_test = Test_data (:, end); Construction and training of the percent adaboost num = 0; iter = 30;% Specifies a weak classifier
        number for i = 1:5-1 5 class for j = i+1:5 num = num + 1;
        % Reclassified index1 = Find (Label_train = = i);
        Index2 = Find (Label_train = = j);
        Label_temp = Zeros ((Length (index1) +length (INDEX2)), 1);
        %SVM need to set the classification label to 1 and -1 label_temp (1:length (index1)) = 1;
        Label_temp (Length (index1) +1:length (index1) +length (index2)) =-1;
        Train_temp = [Train_data (index1,:); Train_data (Index2,:)];
    % training Model Model{num} = Adaboost_train (Label_temp,train_temp,iter); End end% uses the model to predict the classification of the test set predict =Zeros (Length (Test_data), 1);
    For i = 1:length (test_data) data_test = Test_data (i,:);
    num = 0;
    Addnum = zeros (1,5);
            For j = 1:5-1 for k = j+1:5 num = num + 1;
            temp = Adaboost_predict (Data_test,model{num});
            If temp > 0 addnum (j) = Addnum (j) + 1;
            else Addnum (k) = Addnum (k) + 1;
End End End [~,predict (i)] = max (addnum);
End percent show the result--testing figure;
Gscatter (Test_data (:, 1), Test_data (:, 2), predict);
Accuracy = Length (find (Predict==label_test))/length (Test_data); Title ([' Predict the testing data and the accuracy is: ', num2str (accuracy)]);

This is also a result of 300 test samples under 200 training samples as follows:

It can be seen that the results under the iter=30 of a weak classifier are already alarmingly high.

At this point, the above three methods are introduced, the above three methods for supervised multi-classification problem is indeed quite good. As long as you adjust the appropriate parameters according to your sample, the feeling can always get better results. Narrated so much, like a friend to the top of it ~_~! Also welcome to communicate with each other.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.