Ensemble Method of Learning machine learning

Last Update:2015-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently did a lot of Kaggle machine learning contest, summed up in addition to an experience: Do feature enginering can go to the former 20, if you want to enter the first 10, then need ensemble method support, So recently, we have developed a thorough understanding of the following combinations of methods. Through learning to find the combination method is really tried, in the late stage of the competition, at the end of the cornered, may wish to try the combination method, will make people suddenly enlightened,

Portfolio History Submit answer this is the simplest of a combination of methods, only need to combine the previously submitted answers to be submitted again to get the effect, in the late game with others team, this trick is particularly effective, you can directly ensemble their results with others, as long as to ensure sufficient diversity, Can have a noticeable effect.Voting EnsemblesVoting combination, so the name Incredibles, is in the classification of the task to allow multiple results to vote, the number of votes in the category is the final answer. The error correcting codes polling method is common in communication systems where errors are corrected, such as the following code:

1110110011101111011111011011

But for some reason it becomes:

1010110011101111011111011011

The common technique in encoding correction is to pass the redundant code, assuming that for the above code, the same code word is transmitted 3 times, which can eventually be corrected by voting to correct the occasional wrong code:

Original signal:1110110011encoded:10,3 101011001111101100111110110011decoding:101011001111 101100111110110011Majority vote:1110110011

The voting combination example assumes the following correct results:

1111111111

I trained 3 classifiers, each classifier can achieve 70% of the correct rate, and finally from the probability to calculate how much the voting combination method can achieve the correct rate:

All three is correct  0.7 * 0.7 * 0.7= 0.3429Two is correct  0.7 * 0.7 * 0.3+ 0.7 * 0.3 * 0.7+ 0.3 * 0.7 * 0.7 = 0. 4409Two is wrong  0.3 * 0.3 * 0.7+ 0.3 * 0.7 * 0.3+ 0.7 * 0.3 * 0.3= 0.189All Three is wrong  0.3 * 0.3 * 0.3 = 0.0 27

the last combination method can achieve the correct rate of ~78% (0.3429 + 0.4409 = 0.7838).
Number of votersLet's look at the impact of the number of voters on the end result:
Learned communication know that when the SNR is lower, the code error rate is higher, but can pass the redundant encoding, the final vote to correct some coding errors, showing the different number of redundant coding to the final error rate, you can see when the signal to noise ratio (SNR) is lower, as the number of repetitions increased , the number of combinations increases, the coding error is reduced. Model relevance when a combination of the results of the submission, the low correlation between multiple results will bring greater gain to the combination, why? For example, the following three results were submitted:

1111111100 = 80% accuracy1111111100 = 80% accuracy1011111100 = 70% accuracy.

Poll results:

1111111100 = 80% Accuracy

Because of the high correlation of the final result, the combined effect is not ideal.
For example there are the results of the following additional three model:

1111111100 = 80% accuracy0111011101 = 70% accuracy1000101111 = 60% Accuracy

Final poll results:

1111111101 = 90% Accuracy

So the high diversity of submissions results, the low correlation will increase the ensemble effect. Weighing why to ensemble when the introduction of weights, in fact, is to let the model and a higher voice in voting ensemble, the default is that everyone has the same vote, weighing is to break this balance, Higher weights are placed on the results of good performance. Or you can think of some bad model to fix some of the better model bugs. Averaging the methods described above are mainly used for classification problems, and for the regression problem, the average method can be used to achieve the purpose of ensemble. Through the average of the results, but also to a certain extent to prevent overfitting, from can get some inspiration, the Green line represents one of the results we submitted, when we put multiple results on average, we will get a more close to the black line, this line can be well separated from the red and blue categories.
Rank averagingIn ranking such problems, if we go straight to the average result, there are often some problems, such as the classifier for the ranking has the following output:

id,prediction1,0.350000562,0.350000023,0.350000984,0.35000111

If the results of the above and another classifier are ensemble, the results are not changed.

id,prediction1,0.572,0.043,0.964,0.99

So in the face of ranking problems, we can do some calibration to the results, such as the initial results are as follows:

id,rank,prediction1,1,0.350000562,0,0.350000023,2,0.350000984,3,0.35000111

We can normalizing aThe result, it will be better to combine the other results.

id,prediction1,0.332,0.03,0.664,1.0

Stacked Generalization & BlendingTo be Continued ...

Ensemble Method of Learning machine learning

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Ensemble Method of Learning machine learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Ensemble Method of Learning machine learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support