Hyperlike clustering algorithm published on Science

Source: Internet
Author: User

The author (Alex Rodriguez, Alessandro laio) proposed a simple and elegant clustering algorithm that can recognize clusters of various shapes and Its super parameters are easy to determine.

Algorithm IDEA

This algorithm assumes thatCenter of the class ClusterBy somePoints with low local densityAnd the distance between these points isHigh Local DensityThe distance between vertices is relatively large. First, two values are defined:Local DensityAndTo high local densityDistance:

Where

It is a truncation distance and a super parameter .? So it is equivalent to the distance point.IThe number of vertices whose distance is less.Relative Value sensitive, SoDCThe choice is relatively robust. One recommended approach is to make the average number of neighbors for each vertex 1%-2% of all vertices.

For the point with the highest density, set.Note that only those vertices with the highest density are larger than normal adjacent vertices.

Clustering Process

Those points with relatively large local density and large size are considered to beCenter of the class Cluster. The local density is small,Delta IThe bigger point isException. After the cluster center is determined,All other points/belonging to/closest to the class cluster center/represented by the class Cluster(I love machine learning comments: the original article is "assigned to the same cluster as its nearest neighbor of higher density", which is a type of nearest neighbor with a higher density than it. I would like to thank Deng Gong @ djvu9 for pointing out the error. The legend is as follows:

The left is the distribution of all points in two-dimensional space, and the right isPIs the abscissaDeltaIs an ordinate, which is calledDecision Chart(Demo-tree). We can see thatP IAndDelta IAre relatively large, as the center of the class cluster. 26, 27, 28 three pointsDelta IIt is also relatively large,P ISmall, so yesException.

Cluster Analysis

In cluster analysis, we usually need to determine the distribution of each point.A class ClusterOfReliabilityIn this algorithm, you can first defineBorder Area(Border Region), that is, the distance to the cluster but smaller than the distance from the points of other clusters.DCAnd then findLocal DensityThe largest point, so that the local density is. all vertices with a higher local density in these clusters are considered as part of the core of the class cluster (that is, the points allocated to the class cluster are highly reliable ), the remaining points are regarded as halo of the cluster, that is, noise. the legend is as follows:

Figure A shows the probability distribution of generated data, and Figure B and Figure C generate 4000,100 0 points from the distribution. D and E are the decision charts (demo-tree) of data groups B and C respectively. We can see that there are only five vertices in the two data groups that are relatively large.P IAnd very largeDelta I. As the center of the class cluster, after determining the center of the class cluster, each point is divided into various class clusters (color points ), or it is divided into the halo of the class cluster (black points ). figure F shows that the clustering error rate gradually decreases as the number of sampling points increases, indicating that the algorithm is robust.

Finally, the clustering effect of the algorithm on various data distributions is very good.

References:

[1]. Clustering by fast search and find of density peak .? Alex Rodriguez, Alessandro laio

Original article: http://www.52ml.net/16296.html

Marked as note in red

Hyperlike clustering algorithm published on Science)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.