Principles and examples of Clara algorithms for data Mining (bugs in the code)

Source: Internet
Author: User

The following two articles introduce the K-means algorithm and K-mediod algorithm based on partition thought in clustering

This article will continue to introduce the second K-mediod algorithm based on partition thought-----clara algorithm





The Clara algorithm can be said to be an improvement to the k-mediod algorithm, just as the k-mediod algorithm

The K-means algorithm is improved as well. Clara (Clustering large application) algorithm is an application

in clustering of large-scale data . The core algorithm uses the K-MEDIOD algorithm. It's just this algorithm.

It makes up for the flaw that K-mediod algorithm can only be applied to small-scale data.






The core of the Clara algorithm is to conduct multiple samples of large-scale data, and sample samples for Med-diod each time.

cluster, and then compare the sample cluster centers of multiple samples to select the best Clustering Center.

of course the Clara algorithm also has some drawbacks, because it relies on Number of Samples , each time the sample data

whether evenly distribution, and sampling the size of the sample. Even so, the Clara algorithm is still for our

Provides a method for large-scale data clustering.






The detailed descriptive narration of the Clara algorithm is for example the following:

1. Multiple sampling of large-scale data to obtain a copy of the


2. K-mediod clustering of samples per sample to obtain multiple groups of cluster centers


3. Find out the distance of each cluster center to all other points.


4. Find out the minimum values for these sets of distances. The distance and the smallest group are the best clustering centers.


5. Then the large-scale data according to distance clustering to this group of optimal cluster center



MATLAB simulation code such as the following:

Clc;clear;load data3.mat;k=3;    % given the number of categories Time=5;%time is the number of samples Number=30;%number for the sample number for T=1:time Clomstaticsample=zeros (1,number);   Clomstaticsample=randsample (Clomstatic,number);                                                          %clomstaticsample is the sample data, and then the KMEDIOD algorithm is used to cluster the sample data.    % produces three random integers, random cluster center p=randperm (number);    Temp=p (1:K);    Center=zeros (1,K);    For J=1:k Center (j) =clomstaticsample (Temp (j));        End [Clomstaticsample]=sort (Clomstaticsample);           Tempdistance=zeros (number,3);                          % transient difference while 1 circulm=1;        % cyclic control p1=1;        P2=1;        P3=1;           if (circulm~=1) clear Group1 Group2 Group3;            End for I=1:number for J=1:3 tempdistance (i,j) =abs (Clomstaticsample (i)-center (j));            End [Rowmin rowindex]=min (Tempdistance (i,:)); if (RowIndex (1) ==1) Group1 (p1) =cloMstaticsample (i);            p1=p1+1;                ElseIf (RowIndex (1) ==2) Group2 (p2) =clomstaticsample (i);            p2=p2+1;                ElseIf (RowIndex (1) ==3) Group3 (p3) =clomstaticsample (i);            p3=p3+1;            End End Len1=length (GROUP1);            Len2=length (GROUP2);                              Len3=length (GROUP3);                  % calculates the distance of each class from the point of the class center to all other points, and e,e the new cluster Center for that class.                  E=zeros (1,len1-1);                  Q1=1;                            For J=1:len1 for I=1:number if (Group1 (j) ~=center (1) &&i~=j)                            E (Q1) =floor (ABS (Group1 (j)-clomstaticsample (i)));                        q1=q1+1;                 End End End Newcenter (1) =min (E);                  E=zeros (1,len2-1);                  Q2=1;                    For J=1:len2 for I=1:number    if (Group2 (j) ~=center (2) &&i~=j) E (Q2) =floor (ABS (Group2 (j)-clomstaticsample (i)));                        q2=q2+1;                  End End End Newcenter (2) =min (E);                  E=zeros (1,len3-1);                  Q3=1;                            For J=1:len3 for I=1:number if (Group3 (j) ~=center (3) &&i~=j)                            E (Q3) =floor (ABS (GROUP3 (j)-clomstaticsample (i)));                        q3=q3+1;            End End End Newcenter (3) =min (E);            % to infer whether the new class and the old class cluster Center are different, and then continue clustering, otherwise the cluster ends Judgeequal=zeros (1,k);            For I=1:k judgeequal= (newcenter==center);            End S=0;                For I=1:k if (judgeequal (i) ==1) s=s+1;          End End If (s==3)      Break     End circulm=circulm+1;           End Centersum5=zeros (time,k);     % Save the result value of the Kmediod cluster center after each sample.     CENTERSUM5 (i,1) =center (1);     CENTERSUM5 (i,2) =center (2); CENTERSUM5 (i,3) =center (3) end% calculates the minimum distance and value of each cluster center point to all other points as the optimal Cluster Center Sum=zeros (1,time); for I=1:time for J=1:k for R=1:nu            Mber-1 if (CENTERSUM5 (i,j) ~=clomstaticsample (R)) Sum (i) =sum (i) +CENTERSUM5 (i,j)-clomstaticsample (R);        End End Endend[sumorder Centerend]=sort (Sum); The% optimal Cluster Center is centre (centerend);% finally clustering of big data (according to the Chosen Optimal Clustering center)        Q1=1;        Q2=1;        Q3=1; For I=1:length (clomstatic) for J=1:3 endtempdistance (i,j) =abs (clomstatic (i)-CENTERSUM5 (centerend            , j));            End [Rowmin rowindex]=min (Endtempdistance (i,:));                if (RowIndex (1) ==1) EndGroup1 (Q1) =clomstatic (i);            q1=q1+1;                ElseIf (RowIndex (1) ==2) EndGroup2 (Q2) =clomstatic (i); Q2=q2+1;                ElseIf (RowIndex (1) ==3) ENDGROUP3 (Q3) =clomstatic (i);            q3=q3+1; End End




Reprint please indicate the article Xiao Liu

Principles and examples of Clara algorithms for data Mining (bugs in the code)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.