Weka algorithm Classifier-tree-RandomForest source code analysis (2) code implementation, randomforest

Last Update:2014-09-11 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The implementation of RandomForest is exceptionally simple and simply beyond the expectation of the bloggers. Weka combines Bagging and RandomTree in the implementation mode.

1. RandomForest Training

The code for building RandomForest is as follows:

  public void buildClassifier(Instances data) throws Exception {    // can classifier handle the data?    getCapabilities().testWithFail(data);    // remove instances with missing class    data = new Instances(data);    data.deleteWithMissingClass();    m_bagger = new Bagging();    RandomTree rTree = new RandomTree();    // set up the random tree options    m_KValue = m_numFeatures;    if (m_KValue < 1)      m_KValue = (int) Utils.log2(data.numAttributes()) + 1;    rTree.setKValue(m_KValue);    rTree.setMaxDepth(getMaxDepth());    // set up the bagger and build the forest    m_bagger.setClassifier(rTree);    m_bagger.setSeed(m_randomSeed);    m_bagger.setNumIterations(m_numTrees);    m_bagger.setCalcOutOfBag(true);    m_bagger.buildClassifier(data);  }

This code intuitively shows that invalid data is removed first, and a Bag is created to set the attribute values used by each tree in the random forest and set the maximum depth, next, the RandomTree is passed to Bagging as a base classifier, and then the Training Method of bagging is called for training.

Ii. RandomForest Classification

After reading the training process, we can look at the specific classification process, that is, the classifyInstance function. It is worth noting that RandomForest inherits from Classifier, but it does not reload the classifyInstance method, the classifyInstance function of the base class Classifier is used, but the distributionForInstance function is overloaded. The distributionForInstance function is a function used by the Classifier classifyInstance function and returns the probability of an instance on all classes. The Code is as follows:

  public double[] distributionForInstance(Instance instance) throws Exception {    return m_bagger.distributionForInstance(instance);  }

We can see that the distribution of the given instance in each class is delegated to bagger (really lazy), so no detailed analysis is performed here. The detailed analysis is left when bagger is analyzed.

Next, let's take a look at how the base class Classifier uses distribution to give the classification result.

  public double classifyInstance(Instance instance) throws Exception {    double[] dist = distributionForInstance(instance);    if (dist == null) {      throw new Exception("Null distribution predicted");    }    switch (instance.classAttribute().type()) {    case Attribute.NOMINAL:      double max = 0;      int maxIndex = 0;      for (int i = 0; i < dist.length; i++) {        if (dist[i] > max) {          maxIndex = i;          max = dist[i];        }      }      if (max > 0) {        return maxIndex;      } else {        return Instance.missingValue();      }    case Attribute.NUMERIC:    case Attribute.DATE:      return dist[0];    default:      return Instance.missingValue();    }  }

We can intuitively see that if a classification is used, the maximum probability is given. If it is a regression (that is, the attribute corresponding to classIndex is a numerical value), the dist [0] is returned. here we use a convention. The first element represents the regression value.

Iii. Summary

The RandomForest code analysis is almost complete, and there is basically no substantive content, because the main work of the algorithm is done by Bagging and RandomForest. It is worth noting that, when the number of sampling attributes is not specified, Weka uses log2 (K) as the experience value.

The next blog will analyze Weka's RandomTree, and then analyze Bagging, so that RandomForest can be supplemented.

Recently I want to learn about data mining. Is there a weka-based clustering algorithm source code implementation?

I recently started to use data mining. I am studying kmeans. Due to the random changes in the kmeans center, the clustering results have some unreasonable changes, so I am trying to determine the initial center.

Weka & Data Mining: There are many algorithms in Weka, but Weka does not show how to implement the algorithms. Are there any related papers?

I think it is easier to understand the source code directly. WEKA is open-source.

If you want information gain, see this:

Www.360doc.com/...shtml

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Weka algorithm Classifier-tree-RandomForest source code analysis (2) code implementation, randomforest

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Weka algorithm Classifier-tree-RandomForest source code analysis (2) code implementation, randomforest

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support