MATLAB implementation of K-means Clustering algorithm

Source: Internet
Author: User
Tags abs min

The principle of clustering and classification in data mining is widely used.



clustering is unsupervised learning.

The classification is supervised learning.

the popular point is: Before clustering is the classification of unknown samples. It is divided into similar clusters based on the similarity of the sample itself .

The classification is a known sample classification, you need to match the sample features and classification features, and then each sample into the given class.



because this paper is the realization of the K-means algorithm in the clustering algorithm, some clustering algorithms are introduced in the next chapter.



The clustering algorithm consists of many types, which can be assigned as follows:

1. Partitioning method: The clustering algorithm based on this idea includes K-means,pam,clara,clarans,stirr.


2. Hierarchical approach: Clustering algorithms based on this idea include Birch,cure,rock,chamlean


3. Density method: The clustering algorithm based on this idea includes Dbscan,optics,denclue,fdbscan,incremental DBSCAN


4. Grid method: The clustering algorithm based on this idea includes Sting,wavecluster,optigrid


5. Model method: A clustering algorithm based on this idea includes Autoclass,cobweb,classit


6. Neural networks: There are two kinds of clustering algorithms based on the idea network: one self-organizing feature mapping

Second, competitive learning





and K-means is based on the division of thought. So here is the idea of dividing clusters:

1. First randomly determine the K cluster centers for a set of sample data


2. The clustering centers were subsequently changed through iterative iterations, making continuous optimization. And the constant optimization means:

the same kind of sample is closer to the cluster center, and the distance between the samples is more and more distant.

and the position that eventually converges to the center of the cluster no longer moves.





since K-means is based on such a division of thought, then of course, the K-means of the algorithm thought Essence and division of ideas are consistent.

The K-means algorithm is as follows:

1. Set the sample to X{x (1), X (2) ...}


2. First randomly select K cluster centers in the sample.


3. The distance to each cluster center is then computed for the sample points outside the cluster center.

categorize the sample to the nearest sample point at the center of the sample. This enables the initial clustering


4. Re-compute the cluster Center for each class. Then, the distance from the sample points to the three cluster centers is recalculated.

classifying the sample to the nearest sample point at the center of the sample enables the first optimization of clustering.


5. Repeat step four until the location of the two-time cluster center is no longer changing, which completes the final cluster



K-means MATLAB is implemented as follows: (k=3)

CLC

Clear
CLOMSTATIC=[1,2,3,25,26,27,53,54,55]; Len=length (clomstatic); the length of the vector clomstatic is k=3;
% of the given number of classes produces three random integers, random cluster center p=randperm (len);
Temp=p (1:K);
Center=zeros (1,K);
For I=1:k Center (i) =clomstatic (Temp (i));
 End% calculates the distance to the cluster center of the sample data except for the cluster center, and then the cluster Tempdistance=zeros (len,3);
    
    While 1 circulm=1;
    P1=1;
    P2=1;
    
    P3=1;
    Judgeequal=zeros (1,K);   
    if (circulm~=1) clear Group1 Group2 Group3;
        End for I=1:len for J=1:3 tempdistance (i,j) =abs (clomstatic (i)-center (j));
        End [Rowmin rowindex]=min (Tempdistance (i,:));
            if (rowindex==1) Group1 (p1) =clomstatic (i);
        p1=p1+1;
            ElseIf (rowindex==2) Group2 (p2) =clomstatic (i);
        p2=p2+1;
            ElseIf (rowindex==3) Group3 (p3) =clomstatic (i);
        p3=p3+1;
        End End Len1=length (GROUP1);
        Len2=length (GROUP2);
        
        
        Len3=length (GROUP3); % calculationGroup1,group2,group3 mean value of Meangroup1=mean (GROUP1);
        Meangroup2=mean (GROUP2);

        
        Meangroup3=mean (GROUP3);
        % The nearest point of distance mean in each class is calculated as the new cluster Center Absgroup1=zeros (1,LEN1);
        For T=1:len1 AbsGroup1 (t) =floor (ABS (Group1 (t)-meangroup1));
        End [MaxAbsGroup1 maxabsgroup1index]=min (ABSGROUP1);
        Newcenter (1) =group1 (Maxabsgroup1index);

        Clear AbsGroup1;
        Absgroup2=zeros (1,LEN2);
        For T=1:len2 AbsGroup2 (t) =floor (ABS (GROUP2 (t)-meangroup2));
        End [MaxAbsGroup2 maxabsgroup2index]=min (ABSGROUP2);
        Newcenter (2) =group2 (Maxabsgroup2index);
          
        Clear AbsGroup2;
        Absgroup3=zeros (1,LEN3);
        For T=1:len3 AbsGroup3 (t) =floor (ABS (GROUP3 (t)-meangroup3));
        End [MaxAbsGroup3 maxabsgroup3index]=min (ABSGROUP3);
        Newcenter (3) =group3 (Maxabsgroup2index);
        
        Clear AbsGroup3;
   % to determine whether the new class and the old class cluster Center are different, and then continue clustering, otherwise the cluster ends     Judgeequal=zeros (1,K);
        For I=1:k judgeequal= (newcenter==center);
        End S=0;
            For I=1:k if (judgeequal (i) ==1) s=s+1;
        End End If (s==3) break;
  End circulm=circulm+1;

 End



The clustering results are as follows:



not until the convergence of the algorithm or the problem of the code itself, after the run will continue to run, at this time by pressing CTRL + C interrupt code, the cluster results come out.

If there is a great God found the reason also hope to enlighten younger brother, many thanks.



Reprint please indicate the author: Xiao Liu

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.