Paper Clustering-based ensembles as an alternative to stacking

Last Update:2015-01-09 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Clustering-based ensembles as an alternative to stacking

Anna Jurek, Yaxin Bi, Shengli Wu, and Chris D. Nugent, Member, IEEE

Magazine: IEEE Transactions on Knowledge and DATA ENGINEERING, vol. 9, September 2014

This paper is a cluster integration problem, the cluster framework is a traditional framework, according to the thesis is a bit innovative, is the traditional classification integration framework, the latter part with clustering integration instead, the final framework is:

The first half obtains the class designator through multiple classifiers.
The second half is a cluster integration (Kmeans) with the old attributes by looking at the new properties of the left sample in the class label. DBI as a metric.
The final sample is divided by the K Center.

This simple combination actually a lot of published papers have been used, I think this paper is the main innovation is the second part of the discussion:

Why the combination of classification labels as new attributes, can improve the accuracy.

The paper points out that the previous practice of the thesis is empirical, and the thesis is proved by theory.

For supervised learning, this is in fact the traditional classification of the integration Framework, as follows:

For data sets, the N-column results are obtained by dividing N classifiers.
Combines the results of n classifiers with the properties of the dataset to obtain new properties.
The information entropy of each attribute is calculated by the information gain function.
Select the most representative attribute.
For the extracted attributes, the Kmeans, clustering, is measured by dbindex.
The model training is finished and the class label of the sample is divided by the K Center.

The model is trained as above, and after the model is trained, only the original properties of the sample are computed for the newly entered dataset.

The following is a discussion of the thesis:

For a set of even a certain (validation sets):

V1,v2, in fact, is the initial data set through the N classifier after the classification results, as a new property of the sample C1 to Cn, and the property F1 to FK, where the n k lowercase meaning is extracted, P is the number of samples.

Since it is a discussion: Why the combination of classification labels as new attributes, can improve the accuracy.

Then is one adds the classification result, one does not add, the former is V1, the latter is V2.

The problem is: the user two to determine the set, as Kmeans cluster, and then the class label of the sample through the K Center, why V1 than V2 accurate rate .

respectively, V1 V2 Kmeans Clustering, assuming the example below, the left is the V1, there is a V2, wherein y* z* is the center of the class, you can see V1 clustering into 8, V2 clustering into 7:

How to determine the class label of a clustering result? because this is supervised learning, so is to know the class label of the sample, then is a class in which the class label is more, this class label is the label of the class center . This sentence is more important, because the discussion behind it is based on it.

Of course not all of them are considered, such as the points above the line dividing lines, easy to lead to overlap, so the following constraints are introduced:

The meaning is to consider only a certain distance from the center of the sample point, where the DC, DF represents the sample X and center y* between the European distance, it can be seen that the two critical values are taken as follows:

For the first, each class center, the sample point in its class, selects the maximum distance to the center point (DC), each class has one such distance, and finally chooses the smallest as θ 1.

The second, similar to the above, just measures a DC into DF, while choosing V1 V2 as the lesser of θ2.

This is illustrated below:

When this constraint is added, the class label of the class center is affected because only the sample points within the constraint are considered. After knowing the class label of the class center, consider the classification stage, for a sample xthat does not know the label, F1 to Fk by the original attribute, to calculate his nearest center, and then use the center of the class label as its label, assuming that the sample x Real class labeled CR , by means of a formula:

Where L1, representing the number of classes, L1 =8.

The measure of accuracy is to measure whether the label of the class center that the predicted sample x will belong to is the same as the actual class label of x . The probability is expressed as follows:

which

V1:i II is for the purpose of constraint, III constrains the center of the class to which X will be assigned.

In this case, it turns out that the probability above is larger than the probability below, that is, X 's nearest cluster center, by adding a classification label as a property, is more likely to be recognized as a CR label.

By certain derivation, the following formula can be obtained, and the derivation process is shown in appendix.

The left is actually the V1 probability, the right is the V2 probability. Then if the middle part >=1, you can get to the left >= right.

The establishment of the above conditions only requires the following conditions, the derivation process is also in the appendix.

Above this condition means: two real labeled samples, assigned to the same classification of the probability, than two real labels different samples, assigned to the same classification of the probability of large.

This is actually the role of classification, the appendix also proved that for the real number of only 2 times, the accuracy of the classifier to 0.5 can be set up.

The above is the classification label + Sample properties > Sample properties, and the paper also proves that the classification label + Sample properties > Classification label, just meet the conditions:

This means: two samples with the same real marking are more similar than those of two real labels.

Paper Clustering-based ensembles as an alternative to stacking

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Paper Clustering-based ensembles as an alternative to stacking

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Paper Clustering-based ensembles as an alternative to stacking

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support