The combination algorithm of classifier to improve the accuracy of the summary

Source: Internet
Author: User

Classifier lifting accuracy is mainly through the combination of multi-classifier results, the final results are categorized.

There are three main combinations of methods: bagging (bagging), lifting (boosting) and immediate forest.

Steps for bagging and lifting methods:

1, generating several training sets based on the learning data set

2, use the training set to generate several classifiers

3, each classifier is forecasted, through simple election (bagging) or complex election (promotion), to determine the final result.

As shown, for DataSet D, obtain a subset of D1~DK, then use M1~MK different classifiers for the classification training, and then use the test set (SGD group) to obtain the predicted results,

Finally to these K results use a minority to obey majority principle to determine. If the 99 classification method to get 55 results is 1, 44 results are 0, the final result is determined to be 1.

In the lifting (boosting) algorithm, it can be regarded as the improvement of bagging, which can be understood as weighted voting. The adaptive boosting algorithm is introduced here specifically

The algorithm basically and bagged consistent, is the new introduction of a weight concept, first, in (1) initialization, the weight 1/d, that is, each tuple (Di) weights consistent, in the ground (9) ~ (11) Step,

The weights are constantly refreshed, and here we can see that the weights of the tuples that are correctly categorized are always multiplied by a number less than 1, that is, the tuple that is correctly categorized, the likelihood of being selected as the training set DI is reduced,

The classifier will focus on "hard to classify" data. We are based on a belief that "some classifier may have a good effect on a particular sort of data".

Add: Tuple concept: Tuple is the smallest data unit, such as a person is a tuple, have height, weight and other attributes.

After the data is trained, it is a combination classifier.

Here we see, there appears a weight, the classifier of the voting weight, this weight is according to the classifier accuracy rate (the lower the error rate, the higher the weight).

The next step is to introduce a decision tree lifting algorithm: Random forest.

The random forest is actually very intuitive, that is, using the random bagging method mentioned above, for each DI construction decision number, here with the cart algorithm (only need to calculate the Gini index), not pruning.

Then vote for all the trees in the forest.

Examples of random forests with R language:

If the Randomforest package is not installed, first install.packages ("Randomforest")
Library (randomforest) model.forest = Randomforest (species~.,data=iris) pre.forest=predict (model.forest,iris) Table ( Pre.forest,iris$species)

  

Accurate rate up to 100%

And with a single decision tree

Library (rpart) Model.tree " class " ) Pre.tree=predict (model.tree,data=iris,type="class") Table (Pre.tree, Iris$species)

found that some of the data will be judged wrong.

PS: The algorithm of combinatorial classifier is excerpted from Han Jiawei "data mining concept and technology".

The combination algorithm of classifier to improve the accuracy of the summary

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.