Machine Learning School Recruit NOTE 2: Integrated Learning _ Machine learning

Source: Internet
Author: User

What is integrated learning, in a word, heads the top of Zhuge Liang. In the performance of classification, multiple weak classifier combinations become strong classifiers.


In a word, it is assumed that there are some differences between the weak classifiers (such as different algorithms, or different parameters of the same algorithm), which results in different classification decision boundaries, which means that they make different mistakes when making decisions. By combining them, we can get more reasonable boundary, reduce the overall error and achieve better classification effect. Integration learning can be used for classification problem integration, regression problem integration, feature selection integration, anomaly detection integration, etc., it can be said that all the machine learning areas can see the integration of learning figure. There are two main problems to be solved in integration learning, the first is how to get several individual learners, and the second is how to choose a combination strategy to assemble these individual learners into a strong learner. Integrated learning-Individual learner (question 1)

The first problem with integrated learning is how to get a few individual learners. Here we have two options.

The first: Individual learner is a kind of, such as the decision tree individual learner, or the neural network of individual learners. (Application is the most extensive)

The most common models used by homogeneous individual learners are cart decision tree and neural network.

There are two kinds of homogeneous individual learners according to whether there is dependency relationship between individual learners: there is strong dependence between individual learners, a series of individual learners need serial generation, and the representative algorithm is boosting series algorithm. The most famous algorithms in the boosting series algorithm include AdaBoost algorithm and ascending tree (boosting) series algorithm. The most widely used algorithm in the lifting tree series is the gradient elevation tree (gradient boosting). There is no strong dependency relationship between individual learners, a series of individual learners can be generated in parallel, and the representation algorithm is bagging and random forest (Random Forest) series algorithm.

The second is that the individual learner is not entirely a kind, or heterogeneous. For example, we have a classification problem, the training set using support vector machine individual learner, logical regression of individual learners and naïve Bayesian learning device to learn, and then through a combination of strategies to determine the ultimate classification of strong learner. Integrated Learning –boosting

The working mechanism of the boosting algorithm is to first train a weak learner from the training set with the initial weight 1

The weights of training samples were updated according to the learning error rate of weak learning, which made the weights of training samples with high learning error rate of weak learner 1 higher, so that the points with high error rate were paid more attention in the weak learner 2.

Then the basic adjustment of the weight of the training set to train the weak learner 2, so repeated until the number of weak learners to achieve the number of predetermined T, the final will be this weak learning device through the set strategy integration, to get the final strong learning device. Representation algorithm: AdaBoost algorithm and lifting tree (boosting) series algorithm. The most widely used algorithm in the lifting tree series is the gradient elevation tree (gradient boosting). Integrated Learning-bagging

Bagging's algorithm is different from the boosting, its weak learner has no dependencies, can be generated in parallel

As can be seen from the above figure, the training set of bagging's individual weak learner is obtained by random sampling. Through the random sampling of T times, we can get the T sampling set, for this T sample set, we can independently train the T-weak learner, and then the T-weak-learning device through the set strategy to get the final strong learner.

Random sampling is necessary to do a further introduction, where the general use of self-service sampling (BOOTSTAP sampling)

For the original training set of M sample, we randomly collect a sample to put in the sampling set. Then put the sample back, that is, the next time the sample is still likely to be collected, so that the acquisition of M, the final can get m samples of the sample set, because it is random sampling, so that each sample set is different from the original training set , and other sample sets are also different, so that a number of different weak learning device.

Random Forest is a special advanced version of bagging

Special: Weak learners of random forests are decision trees

Advanced: Stochastic forest on the basis of random sampling of bagging samples, coupled with the random selection of features, the basic idea is not separated from the bagging integrated learning-integration strategy (question 2)

We assume that the T-weak learner I got is {h_1,h_2,... HT} averaging method

For regression prediction of numerical classes, the usual combination strategy is averaging, that is to say, the output of some and weak learners is averaged to obtain the final predictive output.

Voting law

For the prediction of classification problems, we usually use the voting method. Assuming that our prediction category is {c1,c2,... CK}, for any one of the predicted sample x, our predictive results for the T-weak learner are (H1 (x), H2 (x) ... HT (x)).

The simplest voting method is the relative majority voting method, that is, we often say that the minority obeys the majority, that is, the T-weak learner's prediction results for sample X, the largest number of categories CiCi as the final category. If more than one category obtains the highest ticket, then randomly select one to do the final category.

A slightly more complex voting law is the absolute majority vote, which we often say is for more than half of the votes. On the basis of the relative majority voting law, not only requires the highest vote, but also a majority of the votes. Otherwise, the forecast is rejected.

More complicated is the weighted voting method, which, like the weighted average method, multiplies the number of votes in each weak learner by a weight, which ultimately sums up the weighted votes of each category, and the largest value corresponds to the final category. Learning method

The previous two methods are to the weak learner's results to do the average or vote, relatively simple, but possible learning error is larger, so there is a learning method of this method, for learning method, the representative method is stacking, when the use of stacking combination strategy, We are not doing a simple logical processing of the results of a weak learner, but adding a layer of learning, that is, we use the learning results of the training set weak learner as input, the output of the training set as output, and retraining a learner to get the final result.

In this case, we refer to the weak learner as the primary learner, and the learning device used in conjunction is called the Secondary learning device. For the test set, we first predict the primary learner, get the input sample of the secondary learner, and then predict it once with the secondary learner, and get the final prediction result.

Reference: http://www.cnblogs.com/pinard/p/6131423.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.