AP Clustering Algorithm (RPM)

Source: Internet
Author: User

Affinity propagation (AP) clustering is a new clustering algorithm presented in the Journal of Science in 2007. It is clustered according to the similarity between the N data points, which can be symmetrical, that is, the similarity between the two point (such as Euclidean distance), or asymmetrical, that is, two data points have different similarities between each other. These similarities form the similarity matrix S of the NXN (where n is the n number of points).

The AP algorithm does not need to specify the number of clusters in advance, instead it takes all data points as potential cluster centers, called Exemplar. The value of S (k, k) on the diagonal of the S-matrix as the K-point can be the criterion of the clustering center, which means that the larger the value, the greater the probability that the point becomes a clustering center, which is also called the reference P (preference). The number of clusters is affected by the reference p, and if it is thought that each data point is likely to be a cluster center, then p should take the same value. If the mean value of the input similarity is taken as the value of P, the number of clusters is moderate. If you take the minimum value, you get a cluster with a smaller number of classes.

The AP algorithm passes two types of messages, (responsiility) and (availability). R (i,k) is a numerical message sent from point I to the candidate Cluster Center K, reflecting whether the K-point is suitable as a clustering center for I-points. A (I,K) sends a numeric message from the candidate Cluster Center K to I, reflecting whether I-point chooses K as its cluster center. The greater the R (I, K) and a (I, k), the greater the likelihood that the K-point is the cluster center, and the greater the likelihood that I-point is subordinate to clustering with the K-point as the cluster center. The AP algorithm continuously updates the attraction and attribution values of each point through an iterative process until the m high-quality exemplar are produced, while the rest of the data points are assigned to the corresponding clusters.

Here are some of the common nouns that appear in the article: exemplar: Refers to a cluster center. Similarity: The similarity of data point I and Point J is recorded as S (i,j). Is the similarity of pointing J as a cluster center of point I.

Preference: The reference of data point I is called P (i) or S (i,i). It is the reference of pointing I as the center of the cluster. The median value of the S-similarity value is generally taken. Responsibility:r (i,k) is used to describe the degree to which point K is suitable as a clustering center for data point I. Availability:a (i,k) is used to describe the degree of suitability of point I selection K as its cluster center. Damping factor: damping coefficient, which mainly acts as a convergent function.

The following is a brief description of its merits and demerits, without any theoretical explanation:

[1] Unlike many clustering algorithms, AP clustering does not need to specify K (classic K-means) or other parameters that describe the number of clusters (network structure and scale in the SOM).

[2] The most representative point of a cluster is called Examplar in the AP algorithm, unlike the clustering centers in other algorithms, Examplar is the exact data point in the original data, rather than the clustering Center (K-means) obtained by averaging multiple data points.

[3] Multiple implementation of the AP clustering algorithm, the results are exactly the same, that is, no need to randomly select the initial value of the step.

[4] The algorithm is more complex, O (N*n*logn), and K-means is just O (n*k) complexity. Therefore, when N is large (n>3000), AP Clustering algorithms often take a long time to calculate.

If the error squared sum is used to measure the advantages and disadvantages of the algorithm, AP poly analogy with other methods of error squared and low. (no matter how many times K-center clustering repeats, it does not reach the AP so low error squared sum)

[*6] The AP launches the algorithm by entering a similarity matrix, which allows the data to be non-Euclidean and also allows unconventional point-point measurement methods.

AP Clustering Algorithm (RPM)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.