Ensemble Method of Learning machine learning

Source: Internet
Author: User

Recently did a lot of Kaggle machine learning contest, summed up in addition to an experience: Do feature enginering can go to the former 20, if you want to enter the first 10, then need ensemble method support, So recently, we have developed a thorough understanding of the following combinations of methods. Through learning to find the combination method is really tried, in the late stage of the competition, at the end of the cornered, may wish to try the combination method, will make people suddenly enlightened,

Portfolio History Submit answer this is the simplest of a combination of methods, only need to combine the previously submitted answers to be submitted again to get the effect, in the late game with others team, this trick is particularly effective, you can directly ensemble their results with others, as long as to ensure sufficient diversity, Can have a noticeable effect.Voting EnsemblesVoting combination, so the name Incredibles, is in the classification of the task to allow multiple results to vote, the number of votes in the category is the final answer. The error correcting codes polling method is common in communication systems where errors are corrected, such as the following code:
1110110011101111011111011011
But for some reason it becomes:
1010110011101111011111011011
The common technique in encoding correction is to pass the redundant code, assuming that for the above code, the same code word is transmitted 3 times, which can eventually be corrected by voting to correct the occasional wrong code:
Original signal:1110110011encoded:10,3 101011001111101100111110110011decoding:101011001111 101100111110110011Majority vote:1110110011
The voting combination example assumes the following correct results:
1111111111
I trained 3 classifiers, each classifier can achieve 70% of the correct rate, and finally from the probability to calculate how much the voting combination method can achieve the correct rate:
All three is correct  0.7 * 0.7 * 0.7= 0.3429Two is correct  0.7 * 0.7 * 0.3+ 0.7 * 0.3 * 0.7+ 0.3 * 0.7 * 0.7 = 0. 4409Two is wrong  0.3 * 0.3 * 0.7+ 0.3 * 0.7 * 0.3+ 0.7 * 0.3 * 0.3= 0.189All Three is wrong  0.3 * 0.3 * 0.3 = 0.0 27
the last combination method can achieve the correct rate of ~78% (0.3429 + 0.4409 = 0.7838).
Number of votersLet's look at the impact of the number of voters on the end result:
Learned communication know that when the SNR is lower, the code error rate is higher, but can pass the redundant encoding, the final vote to correct some coding errors, showing the different number of redundant coding to the final error rate, you can see when the signal to noise ratio (SNR) is lower, as the number of repetitions increased , the number of combinations increases, the coding error is reduced. Model relevance when a combination of the results of the submission, the low correlation between multiple results will bring greater gain to the combination, why? For example, the following three results were submitted:
1111111100 = 80% accuracy1111111100 = 80% accuracy1011111100 = 70% accuracy.
Poll results:
1111111100 = 80% Accuracy
Because of the high correlation of the final result, the combined effect is not ideal.
For example there are the results of the following additional three model:
1111111100 = 80% accuracy0111011101 = 70% accuracy1000101111 = 60% Accuracy
Final poll results:
1111111101 = 90% Accuracy
So the high diversity of submissions results, the low correlation will increase the ensemble effect. Weighing why to ensemble when the introduction of weights, in fact, is to let the model and a higher voice in voting ensemble, the default is that everyone has the same vote, weighing is to break this balance, Higher weights are placed on the results of good performance. Or you can think of some bad model to fix some of the better model bugs. Averaging the methods described above are mainly used for classification problems, and for the regression problem, the average method can be used to achieve the purpose of ensemble. Through the average of the results, but also to a certain extent to prevent overfitting, from can get some inspiration, the Green line represents one of the results we submitted, when we put multiple results on average, we will get a more close to the black line, this line can be well separated from the red and blue categories.
Rank averagingIn ranking such problems, if we go straight to the average result, there are often some problems, such as the classifier for the ranking has the following output:
id,prediction1,0.350000562,0.350000023,0.350000984,0.35000111
If the results of the above and another classifier are ensemble, the results are not changed.
id,prediction1,0.572,0.043,0.964,0.99
So in the face of ranking problems, we can do some calibration to the results, such as the initial results are as follows:
id,rank,prediction1,1,0.350000562,0,0.350000023,2,0.350000984,3,0.35000111
We can normalizing aThe result, it will be better to combine the other results.
id,prediction1,0.332,0.03,0.664,1.0
Stacked Generalization & BlendingTo be Continued ...



























































Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Ensemble Method of Learning machine learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.