Source code analysis of WEKA algorithm Classifier-meta-AdaBoostM1 (I)

Last Update:2014-10-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In simple terms, the combined algorithms of multiple classifiers are commonly used voting, bagging, and boosting. In terms of performance, boosting is slightly dominant, while the merge stm1 algorithm is equivalent to the "classic" of the boosting algorithm ".

The voting idea is to use multiple classifiers for voting combination and decide the final classification based on the minority obeying the majority (in most cases, the disadvantage is that a few rules that follow the majority can only avoid the worst cases, but it is also difficult to achieve the least.

The idea of bagging is to train multiple classifiers with random sampling, and finally use voting for voting decisions. classical algorithms such as randomforest (which has been analyzed by a blog before ), the disadvantage is that each base classifier is homogeneous. In terms of accuracy, bagging is not an algorithm that significantly improves the accuracy, but an algorithm that prevents overfitting.

Boosting uses cascading training classifier, so that "next-level" classifier attaches more importance to "next-level" error-tolerant data. Finally, it uses weighted combinations of classifier results for decision-making, the disadvantage is that cascade training is required, so the algorithm is difficult to parallelize. Typical algorithms include ipvstm1 and gbdt.

I. Algorithms

The algorithm does not provide a detailed formal description and proof of the correctness of relevant theories. The references are from Wikipedia.

Http://zh.wikipedia.org/wiki/AdaBoost

Algorithm process:

(1) initialize the weights of each instance in the training set to 1/K (assuming a total of K Use Cases)

(2) for I = 1; I <= m; I ++ (assuming the total number of base classifiers is m)

(3) re-sample the training set based on the weight to obtain the new training set.

(4) Train the base classifier I based on the new training set

(5) evaluate the accuracy of the base classifier R

(6) If the accuracy of R is less than 50%, exit directly and the training fails (this refers to binary classification, and multi-value Classification is similar to expansion)

(7) set the I weight of the base classifier to log2 (1-r/R)

(8) W = 1-r/R (because r> 0.5, W> 1). For instances with incorrect classification, multiply the weight of the original training set by W (that is, increase the weight ).

(9) Normalize the weight of the training set (reduce the weight by a certain number and make it equal to 1)

(10) return to 2

We can see that according to this algorithm, the base classifier is obtained after Step 4 training, and each base classifier gets a weight in Part 7, at last, weighted voting is performed based on the results of each base classifier to obtain the final result.

II. Implementation

When analyzing each classifier, we start with buildclassifier, which is no exception this time.

Public void buildclassifier (instances data) throws exception {super. buildclassifier (data); // a tool in WEKA is used to create a classifier by means of a deep copy from a classifier, while javasstm1 is used as a combine for multiple classifier, this deep copy must copy each base classifier, which is implemented in the super class. // Check whether the data can be classified by AdaBoost. From the Code, only enumeration types can be processed, but I think this is determined by the ability of the base classifier. Getcapabilities (). testwithfail (data); // pre-process data = new instances (data); data. deletewithmissingclass (); // if there is only one attribute column, that is, the attribute of the category itself, then no complicated classification method can be used, but the mzero model is used, the mzero model simply returns the most frequently seen values in the enumerated values. If (data. numattributes () = 1) {system. Err. println ("cannot build model (only class attribute present in Data !), "+" Using zeror model instead! "); M_zeror = new WEKA. classifiers. rules. zeror (); m_zeror.buildclassifier (data); return;} else {m_zeror = NULL;} m_numclasses = data. numclasses (); If ((! M_useresampling) & (m_classifier instanceof weightedinstanceshandler) {buildclassifierwithweights (data); // if the base classifierwithweights (data) is a weight-sensitive classifiher, otherwise, many common weighted-sensitive classifiers are implemented, such as j48, randomtree, randomforest, and bagging.} Else {buildclassifierusingresampling (data); // otherwise, the deduplication method is used. }}

(To be continued)

Source code analysis of WEKA algorithm Classifier-meta-AdaBoostM1 (I)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Source code analysis of WEKA algorithm Classifier-meta-AdaBoostM1 (I)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Source code analysis of WEKA algorithm Classifier-meta-AdaBoostM1 (I)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support