Hyperlike clustering algorithm published on Science

Last Update:2014-08-28 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The author (Alex Rodriguez, Alessandro laio) proposed a simple and elegant clustering algorithm that can recognize clusters of various shapes and Its super parameters are easy to determine.

Algorithm IDEA

This algorithm assumes thatCenter of the class ClusterBy somePoints with low local densityAnd the distance between these points isHigh Local DensityThe distance between vertices is relatively large. First, two values are defined:Local DensityAndTo high local densityDistance:

Where

It is a truncation distance and a super parameter .? So it is equivalent to the distance point.IThe number of vertices whose distance is less.Relative Value sensitive, SoDCThe choice is relatively robust. One recommended approach is to make the average number of neighbors for each vertex 1%-2% of all vertices.

For the point with the highest density, set.Note that only those vertices with the highest density are larger than normal adjacent vertices.

Clustering Process

Those points with relatively large local density and large size are considered to beCenter of the class Cluster. The local density is small,Delta IThe bigger point isException. After the cluster center is determined,All other points/belonging to/closest to the class cluster center/represented by the class Cluster(I love machine learning comments: the original article is "assigned to the same cluster as its nearest neighbor of higher density", which is a type of nearest neighbor with a higher density than it. I would like to thank Deng Gong @ djvu9 for pointing out the error. The legend is as follows:

The left is the distribution of all points in two-dimensional space, and the right isPIs the abscissaDeltaIs an ordinate, which is calledDecision Chart(Demo-tree). We can see thatP IAndDelta IAre relatively large, as the center of the class cluster. 26, 27, 28 three pointsDelta IIt is also relatively large,P ISmall, so yesException.

Cluster Analysis

In cluster analysis, we usually need to determine the distribution of each point.A class ClusterOfReliabilityIn this algorithm, you can first defineBorder Area(Border Region), that is, the distance to the cluster but smaller than the distance from the points of other clusters.DCAnd then findLocal DensityThe largest point, so that the local density is. all vertices with a higher local density in these clusters are considered as part of the core of the class cluster (that is, the points allocated to the class cluster are highly reliable ), the remaining points are regarded as halo of the cluster, that is, noise. the legend is as follows:

Figure A shows the probability distribution of generated data, and Figure B and Figure C generate 4000,100 0 points from the distribution. D and E are the decision charts (demo-tree) of data groups B and C respectively. We can see that there are only five vertices in the two data groups that are relatively large.P IAnd very largeDelta I. As the center of the class cluster, after determining the center of the class cluster, each point is divided into various class clusters (color points ), or it is divided into the halo of the class cluster (black points ). figure F shows that the clustering error rate gradually decreases as the number of sampling points increases, indicating that the algorithm is robust.

Finally, the clustering effect of the algorithm on various data distributions is very good.

References:

[1]. Clustering by fast search and find of density peak .? Alex Rodriguez, Alessandro laio

Original article: http://www.52ml.net/16296.html

Marked as note in red

Hyperlike clustering algorithm published on Science)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hyperlike clustering algorithm published on Science

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hyperlike clustering algorithm published on Science

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support