[Data Mining] ensemble methods and boosting

Source: Internet
Author: User
Document directory
  • 2.1 boosting

Classification is an important application field in Data Mining. It is widely studied and has produced a large number of algorithms, such as the famous K-nearest neighbor, SVM, decision tree, and neural network, logistic regression.

1 What? In the 90 years, Hansen and Salamon reported a symptom on PAMI [1]. They found that for the neural Networks trained Using BackPropagation (Nueral Networks ), the combination of BP-NN, whether it is the average performance of a single BP-NN, or the best performance, the error rate is low. For example, when the noise becomes stronger, the combined BP-NN has a lower error rate than a single one. Why? For example, a single learner can be used as a dictatorship. The combined learning tool (Ensemble methods) is analogous to a decision board. The former is used by one person to determine an event based on experience (for regression problems, we use an analogy to predict a number ). The latter has a series of members to decide. We assume that the experience (training data) enjoyed by the two is the same. The personal ability of the former may be more powerful than any member of the latter, but it also has its shortcomings. However, for the latter, the respective capabilities of the members may not be very prominent, but they have their own strengths and weaknesses. For an incident, everyone makes judgments based on their respective advantages, final Voting (probably weighted voting) makes a decision (for regression problems, we compare them to average their respective predicted values, which may also be a weighted average ). From the perspective of the development of human society, the latter is more like a democratic country, and the former is a dictatorship. Undoubtedly, the latter is more advantageous, especially when there are many unreliable places in experience, or areas with unclear memory. For this decision-making committee, we hope that its members will have their own strengths and strengths, and that such decisions will be more reliable. If the Members are very similar, the value of this committee will be greatly reduced. Similarly, we may also hope that, based on their respective performances, the comments of members in voting or prediction will be influenced, so that the elites will have a greater impact. The Commission is equivalent to ensemble methods. A single member is a weak learner. In ensemble, demo-stump [3] is often used as weak learner. Demo-stump is a decision tree with only one layer. It sounds quite weak, but the combination becomes less weak. 2 How? Ensemble Methods generally includes three frameworks: Bagging, Boosting, and Stacking. They are generally better than individual learning models. They are also common models for classification and numerical prediction. These three models have made great strides in the last 10 or 20 years, and their effects are amazing. People are often surprised why they are doing so well, and they are surprised to find that new methods are doing better. Taking the example above, for a committee, for some "Soy Sauce" members, there will be noise, based on human experience, it is hard to say that such a committee makes good decisions or numerical predictions. However, for Bagging, adding a random variable learner can improve the overall effect. Among the three methods, Boosting is the best-performing model. It works with the addictive
The statistical technology of models has a close relationship. One common drawback of these models is that they are very difficult to analyze. They combine a large number or even hundreds of independent models. You don't know which models contribute to the decisions it makes. The following describes Boosting. R. Schapire, winner of the 2.1 Boosting03 Godel Award, proposed the first practical Boosting algorithm-AdaBoost [3]. The Boost algorithm is not the same as the Committee example we mentioned above. Bagging is similar to the committee's example. People make a decision and then perform (weighted) voting. Boost is an iterative algorithm. In Weka, the corresponding class is javasstm1. The algorithm for model creation is as follows: 1) assign the same weight to each instance (1/n, make sure that the weight value is 1 ). 2) run the basic learner Algorithm on all weighted instances to save the model. 3) Calculate the error rate e of this model in the weighted training set. If e = 0 or e> = 0.5, stop generating the model. 4) for each instance in the dataset, if it is correctly classified by this model, we have the weight: weight (I) = weight (I) -log (e/(1-e )). 5) normalize the weights of all instances so that their sum is 1.6) returns 2) here we can see that for correctly classified instances, their weights are reduced, for error classification instances, their weights are increased. The result is that the subsequent model will focus on the previous model error classification instance. Let's assume that the basic learner supports weighted instances. Otherwise, we can generate unweighted instance sets from the weighted instance set by using the weighted resampling (resampleWithWeights () method of the Instances class in weka. We add each instance to a new set with a probability proportional to the instance. In this way, high-weight instances are more likely to appear repeatedly, and low-weight instances may never be selected. When the size of the new set increases to the same as that of the original set, stop adding. This becomes the training set of the next learner. One drawback of weighted resampling is that low-weight instances are hardly selected in new sets, resulting in information loss before the learning framework is applied. In fact, this is also an advantage. When an algorithm generates a learner with an error rate greater than 0.5, if the weighted data is directly used, boosting will end. When resampling is used, we can discard this resampled dataset and use a different random seed to re-resampling. Experiments show that resampling can be used to iterate more rounds than the original method. When researchers studied AdaBoost, they found an interesting phenomenon. When the error rate of the combined classifier on the training data has been reduced to 0, they continue iteration, it can reduce its error rate on new data. This seems to be against Occam's Razor law. Occam's Razor law says that for two assumptions that have the same performance on empirical data, simply choose the one. This contradiction can be explained by the confidence level of the classifier's prediction. We use this method to measure the confidence level-the difference between the confidence level of true classification prediction and the confidence level of the categories most likely to be predicted, which we call margin. The larger the margin, the more confidence the classifier has in predicting real categories. Experiments show that boosting can improve margin for a long time after the error rate of training data drops to 0. LogitBoost [5] is also a Boost algorithm. Compared with the index-level loss function [6] minimized by AdaBoost, LogitBoost minimizes a log-level Logistic Loss [5]: logitBoost is less sensitive to basic learner error classifications than AdaBoost. Therefore, LogitBoost is more suitable when there is a lot of noise in the class labels of the training data. References: [1] Hansen, L. K., Salamon,
P, Neural networks ensembles, PAMI (12), No. 10, October 1990, pp. 993-1001. [2] Wikipedia-decison stump, http://en.wikipedia.org/wiki/decision_stump%3%freund; schapire (1999 ). "A short introduction to boosting: Introduction
To AdaBoost [4] Jerome Friedman, Trevor Hastie and Robert tibshirani. Additive logistic regression: A Statistical View
Boosting. Annals of Statistics 28 (2), 2000. 337-407. [5] Wikipedia-logitboost, 127h. Witten, Eibe Frank, Mark A. Hall Data Mining: Practical machine learning tools and techniques

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.