Ensemble methods (combination method, integrated method)

Source: Internet
Author: User

Machine learning algorithms, the most discussed is a specific algorithm, such as decision TREE,KNN, in the actual work and Kaggle competition, Ensemble methods (combination method) The effect is often the best, of course, the need to consume training time will be elongated.

The so-called ensemble methods, is to combine several machine learning algorithms together, or to combine the different parameters of an algorithm. Basically divided into the following two categories:

Averaging methods (average method), is to use the complete or part of the training data to train a number of algorithms or some of the parameters of an algorithm, the final algorithm is all the arithmetic average of these algorithms. such as bagging Methods (bagging algorithm), Forest of randomized Trees (random forest) and so on.

In fact, this is relatively simple, the main work is the choice of training data, such as is not a random sampling, is not put back, select how many data sets, select most training data. The following training is the training of each algorithm, then the comprehensive average. The basic algorithm of this method usually chooses very strong and complex algorithm, then averages it, because a single strong algorithm can easily lead to overfitting (Overfit phenomenon), and after aggregate, this problem is eliminated.

Boosting methods (lifting algorithm), is the use of a basic algorithm to predict, and then in the subsequent other algorithms using the results of the previous algorithm, focusing on the error data, so as to continuously reduce the error rate. The motive is to use several simple weak algorithms to achieve a very powerful combinatorial algorithm. The so-called Ascension is to improve the "weak learning algorithm" (boost) as a "strong learning algorithm, is a gradual improvement of the process of learning, in a way, and neural network some similarities." Classic algorithms such as AdaBoost (Adaptive boost, adaptive boost), Gradient Tree boosting (GBDT).

This method generally chooses a very simple weak algorithm as the basic algorithm, because it will gradually improve, so the final few will be very strong. Boosting is relatively complex, there are several aspects: 1) Process:
2) How to get started. Choose a weak classifier, just a little bit more than a random guess, that is to say a>0. 3) How the preceding algorithm affects subsequent algorithms, or, how to improve the algorithm. The coefficients w of the preceding algorithm will be modified according to some algorithm to form the coefficients of subsequent algorithms, so how to modify the coefficient w is very critical. The different loss functions (the difference between the predicted value of the algorithm and the actual value of the data) and how to minimize the loss function determine how to update the coefficient W and determine the final effect of the boosting, and several common boosting:

4) The final algorithm representation:

Which, while E is the algorithm error rate, if e<0.5 so a>0,e smaller (the fewer errors) A is larger (the higher the weight), the algorithm in the final results about the discourse.



"NOTE" In this article, the algorithm itself is not important, it is important, what is ensemble Methods, and how to do the boost process. Once the loss function is determined, we can use the partial derivative + find the gradient direction to handle the boost direction.

Reference article: http://blog.csdn.net/dark_scope/article/details/24863289
http://blog.csdn.net/dark_scope/article/details/14103983 the Elements of statistical learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.