The combination algorithm of classifier to improve the accuracy of the summary

Last Update:2015-07-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Classifier lifting accuracy is mainly through the combination of multi-classifier results, the final results are categorized.

There are three main combinations of methods: bagging (bagging), lifting (boosting) and immediate forest.

Steps for bagging and lifting methods:

1, generating several training sets based on the learning data set

2, use the training set to generate several classifiers

3, each classifier is forecasted, through simple election (bagging) or complex election (promotion), to determine the final result.

As shown, for DataSet D, obtain a subset of D1~DK, then use M1~MK different classifiers for the classification training, and then use the test set (SGD group) to obtain the predicted results,

Finally to these K results use a minority to obey majority principle to determine. If the 99 classification method to get 55 results is 1, 44 results are 0, the final result is determined to be 1.

In the lifting (boosting) algorithm, it can be regarded as the improvement of bagging, which can be understood as weighted voting. The adaptive boosting algorithm is introduced here specifically

The algorithm basically and bagged consistent, is the new introduction of a weight concept, first, in (1) initialization, the weight 1/d, that is, each tuple (Di) weights consistent, in the ground (9) ~ (11) Step,

The weights are constantly refreshed, and here we can see that the weights of the tuples that are correctly categorized are always multiplied by a number less than 1, that is, the tuple that is correctly categorized, the likelihood of being selected as the training set DI is reduced,

The classifier will focus on "hard to classify" data. We are based on a belief that "some classifier may have a good effect on a particular sort of data".

Add: Tuple concept: Tuple is the smallest data unit, such as a person is a tuple, have height, weight and other attributes.

After the data is trained, it is a combination classifier.

Here we see, there appears a weight, the classifier of the voting weight, this weight is according to the classifier accuracy rate (the lower the error rate, the higher the weight).

The next step is to introduce a decision tree lifting algorithm: Random forest.

The random forest is actually very intuitive, that is, using the random bagging method mentioned above, for each DI construction decision number, here with the cart algorithm (only need to calculate the Gini index), not pruning.

Then vote for all the trees in the forest.

Examples of random forests with R language:

If the Randomforest package is not installed, first install.packages ("Randomforest")
Library (randomforest) model.forest = Randomforest (species~.,data=iris) pre.forest=predict (model.forest,iris) Table ( Pre.forest,iris$species)

Accurate rate up to 100%

And with a single decision tree

Library (rpart) Model.tree " class " ) Pre.tree=predict (model.tree,data=iris,type="class") Table (Pre.tree, Iris$species)

found that some of the data will be judged wrong.

PS: The algorithm of combinatorial classifier is excerpted from Han Jiawei "data mining concept and technology".

The combination algorithm of classifier to improve the accuracy of the summary

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The combination algorithm of classifier to improve the accuracy of the summary

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The combination algorithm of classifier to improve the accuracy of the summary

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support