Previous Article --- principles and implementation of K-mediod (PAM) Clustering Algorithm for Data Mining

Source: Internet
Author: User

In the previous blog, we introduced the kmeans algorithm in the clustering algorithm.

It is beyond reproach that kmeans is more efficient due to its simple algorithm and high classification efficiency.

It has been widely used in clustering applications.




However, kmeans is not perfect for the noise and

The error caused by the clustering of isolated points is also a headache.




Therefore, kmediod, an improved kmeans-based algorithm, came into being.

The core concepts of kmediod and kmeans algorithms are similar, but they are the largest

The difference is that when the clustering center is corrected, kmediod is calculated.

In a class cluster, each point of the cluster center is exclusive to all other points.

To optimize the new cluster center.

This difference makes kmediod make up for the shortcomings of the kmeans algorithm.

Kmediod is not sensitive to noise and isolated points.

However, things have two sides. The improvement of clustering accuracy is to sacrifice clustering.

Time to achieve. It is not difficult to see that kmediod needs to constantly find every point

The minimum distance of all other points is used to modify the cluster center, which greatly increases

Clustering convergence time. All kmediod for large-scale Data Clustering

It seems powerless and can only adaptSmall Scale.




Next I will describe the kmediod algorithm again:

1. Set the sample to X {X (1), X (2 )........}


2. First, K cluster centers are randomly selected in the sample.


3. Calculate the distance from the sample points out of the cluster center to each cluster center.

Samples are classified into the sample points closest to the sample center. This achieves the initial clustering.


4. Calculate the distance and minimum value of all other points for the sample points except the points in the center of the class in each class.

The minimum value is used as the new cluster center to achieve a clustering optimization.


5. Repeat Step 4 until the location of the cluster center does not change twice, which completes the final clustering.

Note: Step 4 shows the core differences between kmeans and kmediod.






The MATLAB implementation code of K-mediod is as follows:


CLC; clear; clomstatic = [,]; Len = length (clomstatic); % evaluate the length of the vector clomstatic K = 3; % The number of given classes % generates three random integers. the random cluster center P = randperm (LEN); temp = P (1: K); Center = zeros (1, k ); for I = 1: K Center (I) = clomstatic (temp (I); end % calculates the distance from the sample data except the cluster center to the cluster center, then perform the clustering tempdistance = zeros (Len, 3); While 1 circulm = 1; P1 = 1; P2 = 1; P3 = 1; judgeequal = zeros (1, k ); if (circulm ~ = 1) Clear group1 group2 Group3; end for I = 1: Len for J = 1:3 tempdistance (I, j) = ABS (clomstatic (I)-center (j )); end [rowmin rowindex] = min (tempdistance (I, :)); If (rowindex = 1) group1 (P1) = clomstatic (I); P1 = p1 + 1; elseif (rowindex = 2) group2 (P2) = clomstatic (I); P2 = P2 + 1; elseif (rowindex = 3) Group3 (P3) = clomstatic (I ); p3 = P3 + 1; end len1 = length (group1); len2 = length (group2); len3 = length (Group3); % calculate group1, group2, Mean meangroup1 = mean (group1); meangroup2 = mean (group2); meangroup3 = mean (Group3 ); % calculate the distance from the center of each class to all other points and E, and E is the new cluster center of this class. E = zeros (1, len1-1); Q1 = 1; for j = 1: len1 for I = 1: Len if (group1 (j )~ = Center (1) & I ~ = J) E (Q1) = floor (ABS (group1 (j)-clomstatic (I); Q1 = Q1 + 1; end newcenter (1) = min (e); E = zeros (1, len2-1); Q2 = 1; for j = 1: len2 for I = 1: Len if (group2 (j )~ = Center (2) & I ~ = J) E (Q2) = floor (ABS (group2 (j)-clomstatic (I); Q2 = Q2 + 1; end newcenter (2) = min (e); E = zeros (1, len3-1); Q3 = 1; for j = 1: len3 for I = 1: Len if (Group3 (j )~ = Center (3) & I ~ = J) E (Q3) = floor (ABS (Group3 (j)-clomstatic (I); Q3 = Q3 + 1; end newcenter (3) = min (E); % determines whether the clustering centers of the New and Old classes are different. If they are different, continue clustering; otherwise, the cluster ends judgeequal = zeros (1, k ); for I = 1: K judgeequal = (newcenter = center); End S = 0; for I = 1: K if (judgeequal (I) = 1) S = S + 1; end if (S = 3) break; end circulm = circulm + 1; End


The result is as follows:







Reprinted by Liu

Previous Article --- principles and implementation of K-mediod (PAM) Clustering Algorithm for Data Mining

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.