Integrated Learning (Ensemble learning) principle detailed

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Integrated Learning (Ensemble learning) can be said to be a very popular machine learning method now. It is not a separate machine learning algorithm in itself, but rather a learning task by building and combining multiple machine learning devices. That's what we often call "absorbing." Integrated learning can be used for classification problem integration, regression problem integration, feature selection integration, anomaly detection integration and so on, can be said that all machine learning areas can see the integration of learning figure. This paper makes a summary of the principle of integrated learning. 1. Integration Learning Overview

From the following image, we can summarize the idea of integrated learning. For training set data, we can finally form a strong learner by training several individual learners, through a certain combination of strategies, so as to achieve the purpose of absorbing.

In other words, integration learning has two major problems to solve, the first is how to get a number of individual learners, the second is how to choose a combination of strategies, these individual learners set into a strong learning device. 2. Integrated learning individual Learning device

In the previous section, we mentioned that the first question of integration learning is how to get a number of individual learners. Here we have two options.

The first is that all individual learners are of a kind, or homogeneous. For example, they are individual learners of decision trees, or individual learners of neural networks. The second is that all individual learners are not entirely of a kind, or heterogeneous. For example, we have a classification problem, the training set using support vector machine individual learner, logistic regression learner and naive Bayesian individual learner to learn, and then through a combination of strategies to determine the ultimate classification of strong learners.

At present, the application of the homogeneous learner is the most extensive, and generally we often say that the integrated learning method refers to the homogeneous individual learning device. The most used models of homogeneous individual learners are cart decision tree and neural network. The existence of dependency between individual learners in a homogeneous learner can be divided into two categories, the first is the existence of strong dependency between individual learners, a series of individual learners need to be serial generation, the representative algorithm is the boosting series algorithm, the second is the individual learners there is no strong dependency relationship, A series of individual learners can be generated in parallel, representing algorithms that are bagging and random forest (Forest) series algorithms. The following is a summary of the two types of algorithms. 3. Boosting of integrated learning

The algorithm principle of boosting we can use a graph to make a generalization as follows:

As can be seen from the graph, the working mechanism of the boosting algorithm is to train a weak learner 1 from the training set with the initial weights, and update the weights of the training samples according to the learning error rate of the weak learning, which makes the weight of the training sample points with high learning error rate of the previous weak learner 1 higher, This makes the points with high error rate get more attention in the weak learner 2 behind. Then the weak learner is trained based on the training set after adjusting the weights 2., so repeat until the number of weak learners reached the number of pre-specified T, eventually the T-weak learners through a set of strategies to integrate, to get the final strong learner.

The most famous algorithms in boosting series algorithms are adaboost algorithm and boosting Tree series algorithm. The most widely used algorithm for lifting tree is the gradient lifting tree (Gradient boosting trees). The principles of AdaBoost and ascending tree algorithms are specifically described in the following articles. 4. Bagging of integrated learning

Bagging's algorithm principle and boosting are different, its weak learners are not dependent on the relationship between, can be generated in parallel, we can use a picture to do a generalization as follows:

As can be seen from the above figure, the training set of the individual weak learners of bagging is obtained by random sampling. By the random sampling of T, we can get the T sampling set, we can train the T-weak learners independently, and then get the final strong learners through the set strategy of the T-weak learners.

For the random sampling here is necessary to do a further introduction, which is generally used in the self-Service sampling method (Bootstap sampling), that is, for m samples of the original training set, each time we first randomly collected a sample into the sample set, and then put the sample back, In other words, the sample will still be collected at the next sampling, so that the acquisition of M-times, and finally can get the sample set of M samples, because it is random sampling, so that each sample set is different from the original training set, and other sampling set is different, so that a number of different weak learners.

Random Forest is a special advanced version of bagging, so-called specificity is because the weak learners of random forests are decision trees. The so-called advanced order is the random sampling based on bagging sample, and the random selection of the feature is added, the basic idea of the stochastic forest is not out of the bagging category. The principles of bagging and random forest algorithms are specifically described in a later article. 5. Integration strategy of integrated learning

In the above sections we focus mainly on the learner, referring to the combination of learning strategies but not in detail, this section on the integration of learning strategies to make a summary. We assume that I get the T-a weak learner is the 5.1 average method

For the regression prediction of the numerical class, the combined strategy is usually the average method, that is to say, the output of some and weak learners is averaged to get the final predicted output.

The simplest averages are arithmetic averages, which means that the final predictions are

If each individual learner has a weight, the final prediction is

The is the weight of individual learner , usually

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Integrated Learning (Ensemble learning) principle detailed

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Integrated Learning (Ensemble learning) principle detailed

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support