Easy-to-learn machine learning algorithms-integration Methods (Ensemble method)

Last Update:2015-06-16 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. The idea of integrated learning method

This paper introduces a series of algorithms, each of which has different scopes of application, such as dealing with linear variational problems, and dealing with linear irreducible problems. In the real world life, often because the "collective wisdom" makes the problem is easy to solve, then the problem, in machine learning problems, for a complex task, can be a lot of machine learning algorithms together, so that the results will not be calculated than using a single algorithm performance better? The idea is to integrate learning methods.

The integrated learning approach refers to combining multiple models to achieve better results and to make the integrated model more generalization capable. For multiple models, there are several different ways to combine these models:

The best-performing model is found on the validation data set as the final predictive model;
The prediction results of multiple models are voted or averaged;
The prediction results of multiple models are weighted average.

Some of the above ideas correspond to some of the main learning frameworks in integrated learning.

second, the main method of integrated learning1, strong can learn and weak can learnin the integrated learning method, multiple weak models are combined into a strong model by a certain combination of methods. In the method of statistical learning, the paper introduces the strong learning(strongly learnable)"and" weak to learn(weakly learnable)"concept. in probability approximate correct(probably approximately correct, PAC)Learning Framework, a concept (a class), if there is a polynomial learning algorithm can learn it, and the correct rate is very high, then called this concept is strong can learn. A concept, if there is a polynomial learning algorithm can learn it, learning the correct rate is only slightly better than random guessing, then called this concept is weak to learn. Schapirepointed out that inPACunder the framework of learning, a concept is a sufficient and necessary condition for strong learning that the concept is weak to learn. So for a learning problem, if we find "weak learning algorithm", then we can turn the weak learning method into "strong learning algorithm". 2. Find the best performing model on the verification setThe idea of such a method is similar to that of the decision tree, and the algorithm that satisfies the condition is chosen under different conditions. 3. Multiple models voting or averagingFor the data set training several models, for the classification problem, may adopt the voting method, chooses the most votes the category as the final category, but for the regression question, may adopt the mean value the method, obtains the mean value as the final result. The most famous of these ideas isBaggingmethod.BaggingthatBoostrap aggregating, of which,Boostrapis a kind of sampling method with put-back, and its sampling strategy is simple random sampling. in theBaggingmethod, the learning algorithm is trained several times, each training set consists of a training sample randomly taken from the initial training concentration, and the initial training sample may appear multiple times or not at all in a certain training set. Finally, a predictive function is trained, and the final prediction function can be used in two ways for classification and regression problems:

Classification problem: The voting method, the category with the most votes is the final category
Regression problem: Using a simple averaging method

(image from reference article 2)Random forest algorithm is based onBaggingThe learning algorithm of thought. 4. Weighted average of prediction results for multiple modelsin the above-mentionedBaggingin the method, it is characterized by random sampling, training new models by repeated sampling, and finally averaging on the basis of these models. The weighted average of the prediction results of multiple models is to promote multiple weak learning models to strong learning models, which isboosting's core ideas. in theboostingalgorithm, each training sample is given equal weight during initialization, such as, and then use the learning algorithm to train the training setround, after each training, to the training failure of the training sample to give greater weight, that is, the learning algorithm in the subsequent learning of a few more difficult to learn the training samples to learn, so as to get a predictive function sequence, each of whichhave a weight that predicts a good predictor function with a larger weight. The final predictive function can be used in two ways for classification and regression problems:

Classification problem: The right to vote in a heavy way
Regression problem: Weighted average

(image from reference article 2)AdaBoostand theGBDT (Gradient boosting decision Tree)is based onboostingThe two most famous algorithms of thought.

Reference articles

1, "Statistical learning method"

2. Statistical Learning methods-- CART, Bagging, Random Forest, boosting

Easy-to-learn machine learning algorithms-integration Methods (Ensemble method)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Easy-to-learn machine learning algorithms-integration Methods (Ensemble method)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Easy-to-learn machine learning algorithms-integration Methods (Ensemble method)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support