The following two articles introduce the K-means algorithm and K-mediod algorithm based on partition thought in clustering
This article will continue to introduce the second K-mediod algorithm based on partition thought-----clara algorithm
The Clara algorithm can be said to be an improvement to the k-mediod algorithm, just as the k-mediod algorithm
The K-means algorithm is improved as well. Clara (Clustering large application) algorithm is an application
in clustering of large-scale data . The core algorithm uses the K-MEDIOD algorithm. It's just this algorithm.
It makes up for the flaw that K-mediod algorithm can only be applied to small-scale data.
The core of the Clara algorithm is to conduct multiple samples of large-scale data, and sample samples for Med-diod each time.
cluster, and then compare the sample cluster centers of multiple samples to select the best Clustering Center.
of course the Clara algorithm also has some drawbacks, because it relies on Number of Samples , each time the sample data
whether evenly distribution, and sampling the size of the sample. Even so, the Clara algorithm is still for our
Provides a method for large-scale data clustering.
The detailed descriptive narration of the Clara algorithm is for example the following:
1. Multiple sampling of large-scale data to obtain a copy of the
2. K-mediod clustering of samples per sample to obtain multiple groups of cluster centers
3. Find out the distance of each cluster center to all other points.
4. Find out the minimum values for these sets of distances. The distance and the smallest group are the best clustering centers.
5. Then the large-scale data according to distance clustering to this group of optimal cluster center
MATLAB simulation code such as the following:
Clc;clear;load data3.mat;k=3; % given the number of categories Time=5;%time is the number of samples Number=30;%number for the sample number for T=1:time Clomstaticsample=zeros (1,number); Clomstaticsample=randsample (Clomstatic,number); %clomstaticsample is the sample data, and then the KMEDIOD algorithm is used to cluster the sample data. % produces three random integers, random cluster center p=randperm (number); Temp=p (1:K); Center=zeros (1,K); For J=1:k Center (j) =clomstaticsample (Temp (j)); End [Clomstaticsample]=sort (Clomstaticsample); Tempdistance=zeros (number,3); % transient difference while 1 circulm=1; % cyclic control p1=1; P2=1; P3=1; if (circulm~=1) clear Group1 Group2 Group3; End for I=1:number for J=1:3 tempdistance (i,j) =abs (Clomstaticsample (i)-center (j)); End [Rowmin rowindex]=min (Tempdistance (i,:)); if (RowIndex (1) ==1) Group1 (p1) =cloMstaticsample (i); p1=p1+1; ElseIf (RowIndex (1) ==2) Group2 (p2) =clomstaticsample (i); p2=p2+1; ElseIf (RowIndex (1) ==3) Group3 (p3) =clomstaticsample (i); p3=p3+1; End End Len1=length (GROUP1); Len2=length (GROUP2); Len3=length (GROUP3); % calculates the distance of each class from the point of the class center to all other points, and e,e the new cluster Center for that class. E=zeros (1,len1-1); Q1=1; For J=1:len1 for I=1:number if (Group1 (j) ~=center (1) &&i~=j) E (Q1) =floor (ABS (Group1 (j)-clomstaticsample (i))); q1=q1+1; End End End Newcenter (1) =min (E); E=zeros (1,len2-1); Q2=1; For J=1:len2 for I=1:number if (Group2 (j) ~=center (2) &&i~=j) E (Q2) =floor (ABS (Group2 (j)-clomstaticsample (i))); q2=q2+1; End End End Newcenter (2) =min (E); E=zeros (1,len3-1); Q3=1; For J=1:len3 for I=1:number if (Group3 (j) ~=center (3) &&i~=j) E (Q3) =floor (ABS (GROUP3 (j)-clomstaticsample (i))); q3=q3+1; End End End Newcenter (3) =min (E); % to infer whether the new class and the old class cluster Center are different, and then continue clustering, otherwise the cluster ends Judgeequal=zeros (1,k); For I=1:k judgeequal= (newcenter==center); End S=0; For I=1:k if (judgeequal (i) ==1) s=s+1; End End If (s==3) Break End circulm=circulm+1; End Centersum5=zeros (time,k); % Save the result value of the Kmediod cluster center after each sample. CENTERSUM5 (i,1) =center (1); CENTERSUM5 (i,2) =center (2); CENTERSUM5 (i,3) =center (3) end% calculates the minimum distance and value of each cluster center point to all other points as the optimal Cluster Center Sum=zeros (1,time); for I=1:time for J=1:k for R=1:nu Mber-1 if (CENTERSUM5 (i,j) ~=clomstaticsample (R)) Sum (i) =sum (i) +CENTERSUM5 (i,j)-clomstaticsample (R); End End Endend[sumorder Centerend]=sort (Sum); The% optimal Cluster Center is centre (centerend);% finally clustering of big data (according to the Chosen Optimal Clustering center) Q1=1; Q2=1; Q3=1; For I=1:length (clomstatic) for J=1:3 endtempdistance (i,j) =abs (clomstatic (i)-CENTERSUM5 (centerend , j)); End [Rowmin rowindex]=min (Endtempdistance (i,:)); if (RowIndex (1) ==1) EndGroup1 (Q1) =clomstatic (i); q1=q1+1; ElseIf (RowIndex (1) ==2) EndGroup2 (Q2) =clomstatic (i); Q2=q2+1; ElseIf (RowIndex (1) ==3) ENDGROUP3 (Q3) =clomstatic (i); q3=q3+1; End End
Reprint please indicate the article Xiao Liu
Principles and examples of Clara algorithms for data Mining (bugs in the code)