Principles and examples of Clara algorithms for data Mining (bugs in the code)

Last Update:2015-05-11 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The following two articles introduce the K-means algorithm and K-mediod algorithm based on partition thought in clustering

This article will continue to introduce the second K-mediod algorithm based on partition thought-----clara algorithm

The Clara algorithm can be said to be an improvement to the k-mediod algorithm, just as the k-mediod algorithm

The K-means algorithm is improved as well. Clara (Clustering large application) algorithm is an application

in clustering of large-scale data . The core algorithm uses the K-MEDIOD algorithm. It's just this algorithm.

It makes up for the flaw that K-mediod algorithm can only be applied to small-scale data.

The core of the Clara algorithm is to conduct multiple samples of large-scale data, and sample samples for Med-diod each time.

cluster, and then compare the sample cluster centers of multiple samples to select the best Clustering Center.

of course the Clara algorithm also has some drawbacks, because it relies on Number of Samples , each time the sample data

whether evenly distribution, and sampling the size of the sample. Even so, the Clara algorithm is still for our

Provides a method for large-scale data clustering.

The detailed descriptive narration of the Clara algorithm is for example the following:

1. Multiple sampling of large-scale data to obtain a copy of the

2. K-mediod clustering of samples per sample to obtain multiple groups of cluster centers

3. Find out the distance of each cluster center to all other points.

4. Find out the minimum values for these sets of distances. The distance and the smallest group are the best clustering centers.

5. Then the large-scale data according to distance clustering to this group of optimal cluster center

MATLAB simulation code such as the following:

Clc;clear;load data3.mat;k=3;    % given the number of categories Time=5;%time is the number of samples Number=30;%number for the sample number for T=1:time Clomstaticsample=zeros (1,number);   Clomstaticsample=randsample (Clomstatic,number);                                                          %clomstaticsample is the sample data, and then the KMEDIOD algorithm is used to cluster the sample data.    % produces three random integers, random cluster center p=randperm (number);    Temp=p (1:K);    Center=zeros (1,K);    For J=1:k Center (j) =clomstaticsample (Temp (j));        End [Clomstaticsample]=sort (Clomstaticsample);           Tempdistance=zeros (number,3);                          % transient difference while 1 circulm=1;        % cyclic control p1=1;        P2=1;        P3=1;           if (circulm~=1) clear Group1 Group2 Group3;            End for I=1:number for J=1:3 tempdistance (i,j) =abs (Clomstaticsample (i)-center (j));            End [Rowmin rowindex]=min (Tempdistance (i,:)); if (RowIndex (1) ==1) Group1 (p1) =cloMstaticsample (i);            p1=p1+1;                ElseIf (RowIndex (1) ==2) Group2 (p2) =clomstaticsample (i);            p2=p2+1;                ElseIf (RowIndex (1) ==3) Group3 (p3) =clomstaticsample (i);            p3=p3+1;            End End Len1=length (GROUP1);            Len2=length (GROUP2);                              Len3=length (GROUP3);                  % calculates the distance of each class from the point of the class center to all other points, and e,e the new cluster Center for that class.                  E=zeros (1,len1-1);                  Q1=1;                            For J=1:len1 for I=1:number if (Group1 (j) ~=center (1) &&i~=j)                            E (Q1) =floor (ABS (Group1 (j)-clomstaticsample (i)));                        q1=q1+1;                 End End End Newcenter (1) =min (E);                  E=zeros (1,len2-1);                  Q2=1;                    For J=1:len2 for I=1:number    if (Group2 (j) ~=center (2) &&i~=j) E (Q2) =floor (ABS (Group2 (j)-clomstaticsample (i)));                        q2=q2+1;                  End End End Newcenter (2) =min (E);                  E=zeros (1,len3-1);                  Q3=1;                            For J=1:len3 for I=1:number if (Group3 (j) ~=center (3) &&i~=j)                            E (Q3) =floor (ABS (GROUP3 (j)-clomstaticsample (i)));                        q3=q3+1;            End End End Newcenter (3) =min (E);            % to infer whether the new class and the old class cluster Center are different, and then continue clustering, otherwise the cluster ends Judgeequal=zeros (1,k);            For I=1:k judgeequal= (newcenter==center);            End S=0;                For I=1:k if (judgeequal (i) ==1) s=s+1;          End End If (s==3)      Break     End circulm=circulm+1;           End Centersum5=zeros (time,k);     % Save the result value of the Kmediod cluster center after each sample.     CENTERSUM5 (i,1) =center (1);     CENTERSUM5 (i,2) =center (2); CENTERSUM5 (i,3) =center (3) end% calculates the minimum distance and value of each cluster center point to all other points as the optimal Cluster Center Sum=zeros (1,time); for I=1:time for J=1:k for R=1:nu            Mber-1 if (CENTERSUM5 (i,j) ~=clomstaticsample (R)) Sum (i) =sum (i) +CENTERSUM5 (i,j)-clomstaticsample (R);        End End Endend[sumorder Centerend]=sort (Sum); The% optimal Cluster Center is centre (centerend);% finally clustering of big data (according to the Chosen Optimal Clustering center)        Q1=1;        Q2=1;        Q3=1; For I=1:length (clomstatic) for J=1:3 endtempdistance (i,j) =abs (clomstatic (i)-CENTERSUM5 (centerend            , j));            End [Rowmin rowindex]=min (Endtempdistance (i,:));                if (RowIndex (1) ==1) EndGroup1 (Q1) =clomstatic (i);            q1=q1+1;                ElseIf (RowIndex (1) ==2) EndGroup2 (Q2) =clomstatic (i); Q2=q2+1;                ElseIf (RowIndex (1) ==3) ENDGROUP3 (Q3) =clomstatic (i);            q3=q3+1; End End

Reprint please indicate the article Xiao Liu

Principles and examples of Clara algorithms for data Mining (bugs in the code)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Principles and examples of Clara algorithms for data Mining (bugs in the code)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Principles and examples of Clara algorithms for data Mining (bugs in the code)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support