AP Clustering Algorithm (RPM)

Last Update:2015-07-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Affinity propagation (AP) clustering is a new clustering algorithm presented in the Journal of Science in 2007. It is clustered according to the similarity between the N data points, which can be symmetrical, that is, the similarity between the two point (such as Euclidean distance), or asymmetrical, that is, two data points have different similarities between each other. These similarities form the similarity matrix S of the NXN (where n is the n number of points).

The AP algorithm does not need to specify the number of clusters in advance, instead it takes all data points as potential cluster centers, called Exemplar. The value of S (k, k) on the diagonal of the S-matrix as the K-point can be the criterion of the clustering center, which means that the larger the value, the greater the probability that the point becomes a clustering center, which is also called the reference P (preference). The number of clusters is affected by the reference p, and if it is thought that each data point is likely to be a cluster center, then p should take the same value. If the mean value of the input similarity is taken as the value of P, the number of clusters is moderate. If you take the minimum value, you get a cluster with a smaller number of classes.

The AP algorithm passes two types of messages, (responsiility) and (availability). R (i,k) is a numerical message sent from point I to the candidate Cluster Center K, reflecting whether the K-point is suitable as a clustering center for I-points. A (I,K) sends a numeric message from the candidate Cluster Center K to I, reflecting whether I-point chooses K as its cluster center. The greater the R (I, K) and a (I, k), the greater the likelihood that the K-point is the cluster center, and the greater the likelihood that I-point is subordinate to clustering with the K-point as the cluster center. The AP algorithm continuously updates the attraction and attribution values of each point through an iterative process until the m high-quality exemplar are produced, while the rest of the data points are assigned to the corresponding clusters.

Here are some of the common nouns that appear in the article: exemplar: Refers to a cluster center. Similarity: The similarity of data point I and Point J is recorded as S (i,j). Is the similarity of pointing J as a cluster center of point I.

Preference: The reference of data point I is called P (i) or S (i,i). It is the reference of pointing I as the center of the cluster. The median value of the S-similarity value is generally taken. Responsibility:r (i,k) is used to describe the degree to which point K is suitable as a clustering center for data point I. Availability:a (i,k) is used to describe the degree of suitability of point I selection K as its cluster center. Damping factor: damping coefficient, which mainly acts as a convergent function.

The following is a brief description of its merits and demerits, without any theoretical explanation:

[1] Unlike many clustering algorithms, AP clustering does not need to specify K (classic K-means) or other parameters that describe the number of clusters (network structure and scale in the SOM).

[2] The most representative point of a cluster is called Examplar in the AP algorithm, unlike the clustering centers in other algorithms, Examplar is the exact data point in the original data, rather than the clustering Center (K-means) obtained by averaging multiple data points.

[3] Multiple implementation of the AP clustering algorithm, the results are exactly the same, that is, no need to randomly select the initial value of the step.

[4] The algorithm is more complex, O (N*n*logn), and K-means is just O (n*k) complexity. Therefore, when N is large (n>3000), AP Clustering algorithms often take a long time to calculate.

If the error squared sum is used to measure the advantages and disadvantages of the algorithm, AP poly analogy with other methods of error squared and low. (no matter how many times K-center clustering repeats, it does not reach the AP so low error squared sum)

[*6] The AP launches the algorithm by entering a similarity matrix, which allows the data to be non-Euclidean and also allows unconventional point-point measurement methods.

AP Clustering Algorithm (RPM)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

AP Clustering Algorithm (RPM)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

AP Clustering Algorithm (RPM)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support