Weka algorithm classifier-meta-additiveregression Source code Analysis

Source: Internet
Author: User


Bloggers have recently been fascinated by the monster hunters, the article dragged on for a long time to begin to pen


First, the algorithm

Additiveregression, a more famous name can be called GBDT (grandient boosting decision tree) gradient descent classification tree, or GBRT (Grandient boosting Regression Tree) gradient descent regression trees, is a multi-classifier combination algorithm, more specifically, is a boosting algorithm.


Talking about the boosting algorithm, you can not mention AdaBoost, see the blog that I wrote before, I could see that the core of AdaBoost is cascading classifiers, so that the latter classifier more "focus" on the more easily divided data, that is, the latter-level classifier more in error-prone data set training.


And GBDT as the boosting algorithm, but also the multi-classifier for the Cascade training, the latter-level classifier more attention to all the previous classifier prediction results and the actual results of the residuals, in this residual error training a new classifier, the final prediction will be residual cascade added.


The derivation of formulas for GBDT correlation algorithms can be consulted:

Http://en.wikipedia.org/wiki/Gradient_boosting#Gradient_tree_boosting

Http://www.360doc.com/content/12/0428/15/5874309_207282768.shtml


Pull so much, the following is a brief talk about the algorithm training process.

(1) Number of input training set data and base classifier N

(2) Training of the 1th base classifier using training set data

(3) for (int i=2;i<n;i++)

(4) using the pre-i-1 classifier to predict, calculate the forecast results and training data residuals

(5) If the residuals are less than a certain threshold value, exit the loop.

(5) Use this residual to train the first classifier

(6) Turn (3)


Forecast process:

(1) Calculate the predicted results of N classifiers according to the input data.

(2) Add and return the predicted results.


As you can see, GBDT is not complicated in principle, the concept of "residual" is marked with gradients, and it is not difficult to grasp this clue to understand the derivation formula in the wiki. The complex is "how to prove its effectiveness", which is far more than the scope of this article can be justified.


Second, the source code implementation

Just like all the previous classifiers, it still starts with Buildclassifier.

(1) Buildclassifier

public void Buildclassifier (Instances data) throws Exception {super.buildclassifier (data);    Additiveregerssion only supports numeric data.    Getcapabilities (). Testwithfail (data);    If the class column of the training data is empty, Instances NewData = new Instances (data) is removed;    Newdata.deletewithmissingclass ();    Double sum = 0;    Double temp_sum = 0; The first classifier uses ZeroR, that is, the predicted value is the mean value of the training value, without using the base classifier (the default base classifier is Weka.classifiers.trees.DecisionStump (), which is the single-layer decision tree (decision pile) M_zeror =    New ZeroR ();        M_zeror.buildclassifier (NewData);  If there is only one column, you cannot train if (newdata.numattributes () = = 1) {System.err.println ("Cannot Build Model" (only class attribute      Present in data!), "+" using ZeroR model instead! ");      M_suitabledata = false;    Return    } else {m_suitabledata = true;    }//The Residualreplace function classifies the dataset with a classifier and replaces its class column with residuals, which is analyzed later.    NewData = Residualreplace (NewData, M_zeror, false); for (int i = 0; i < newdata.numinstances (); i++) {sum + = newdata.instance (i). Weight () *newdata.instance (i). Classvalue () * Newdata.instance (i). Classvalue ();//This calculates the weighted residuals squared and} if (M_debug) {System.err.println ("Sum of    Squared residuals "+" (predicting the Mean): "+ sum";    } m_numiterationsperformed = 0;      do {temp_sum = sum; Build the classifier m_classifiers[m_numiterationsperformed].buildclassifier (newdata);//train on the new data set, Note that the new data set class has been replaced with residuals, reflecting the gradient boosting thought NewData = Residualreplace (NewData, m_classifiers[m_      Numiterationsperformed], true);//re-replace with residual sum = 0; for (int i = 0; i < newdata.numinstances (); i++) {sum + = newdata.instance (i). Weight () * Newdata.instance (i). Classvalue  () * Newdata.instance (i). Classvalue ();//recalculate residuals squared and} if (M_debug) {System.err.println ("Sum of squared residuals:      "+sum);    } m_numiterationsperformed++; } while (((temp_sum-sum) > Utils.small) && (m_numiterationsperformed < m_classifiers.length);//exit conditions are  2, the first is two iterations of the residual squared no significant change, the second is that all classifiers have been trained. }
The algorithm idea is simple, the code is also very intuitive.

The Residualreplace function is analyzed below.


(2) Residualreplace

Private Instances Residualreplace (Instances data, Classifier C,     boolean useshrinkage) throws Exception {    double pred,residual;    Instances newinst = new Instances (data);    for (int i = 0; i < newinst.numinstances (); i++) {      pred = c.classifyinstance (Newinst.instance (i));//Predict      if (u Seshrinkage) {pred *= getshrinkage ();//use shrinkage to prevent overfitting      }      residual = Newinst.instance (i). Classvalue ()- pred;//calculates the residuals      newinst.instance (i). Setclassvalue (residual);//The Class of the original data is replaced with residuals    }    //    System.err.print (newinst);    return newinst;  }

What is shrinkage?

The idea of shrinkage (shrinking) is that the effect of getting closer to the result every single step is easier to avoid than fitting in a way that quickly approaches results each time. That is, it does not fully trust each tree, it believes that each tree only learn a small part of the truth, accumulate only a small amount of time, by learning a few trees to compensate for the shortcomings. (Transferred from http://blog.csdn.net/w28971023/article/details/8240756)

As you can see, the residuals themselves can be understood as "vectors that want the classifier's result to go forward", that is, the meaning of the gradient, which includes the direction (which direction the classifier adjusts), and also the length (number of adjustments). And shrinkage is to reduce this length to a certain ratio, such as 10%, so that each time in the direction of the vector forward 10%, in order to prevent overfitting.

Why can shrinkage prevent overfitting? This is another seemingly complex problem ....


(3) Classifyinstance

Public double classifyinstance (Instance inst) throws Exception {    double prediction = m_zeror.classifyinstance (Inst) ;    if (!m_suitabledata) {      return prediction;    }        for (int i = 0; i < m_numiterationsperformed; i++) {      double toadd = m_classifiers[i].classifyinstance (inst);      Toadd *= getshrinkage ();      Prediction + = Toadd;    }    return prediction;  }

The residuals are added to the final result according to the classifier order.


Iv. Summary

If I have to write a summary, then I would like to have the following points:

(1) GBDT thought is simple, the realization is also simple, the effect is very ideal.

(2) Weka's additiveregression is a simple implementation of GBRT and can only handle numerical data.

(3) The core logic of its implementation is to replace the class column of the original data set with the residuals.

(4) The use of shrinkage can be selectively used to prevent overfitting.


Weka algorithm classifier-meta-additiveregression Source code Analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.