MATLAB implementation of K-mean clustering algorithm

Source: Internet
Author: User
Tags rand

An overview of 1.k-mean-value clustering method

This clustering method was used in the mathematical modeling process, but it was not particularly clear how the toolbox was called in MATLAB for clustering. Recently, because of learning pattern recognition, and re-contact with this clustering algorithm, so carefully studied the principle of it. After understanding the manual with Matlab programming to achieve, the final result is good, hehe ~ ~ ~ ~ ~ ~ K-means clustering is given a set of samples (x1, x2, ... xn) (xi, i = 1, 2, ... n are vectors), assuming that you want to gather it into M (<n)  Class can be implemented as follows: Step 1: Randomly select m vectors (y1,y2,... ym) from (x1, x2, ... xn) as the initial cluster center (can be arbitrarily specified, not selected in n vectors);  Step 2: Calculate (x1, x2, ... xn) distance to this m cluster center (strictly 2-order norm);  Step 3: For each XI (i =,... N) compare its distance to (Y1,y2,... ym), find the minimum value, if the distance to YJ is the smallest, then Xi is classified as Class J;  After the step 4:m class is divided, the mean vector of each class is calculated as a new cluster center. Step 5: Compare the distance between the new cluster center and the old cluster center, if it is greater than the set threshold, jump to STEP2;  Otherwise the output classification result and the clustering center, the algorithm ends. OK, nonsense not much to say, directly on the MATLAB code.
Using the K-means clustering principle, the classification of a set of data is realized. Here is an example of a two-dimensional set of points. N = 40; The number of% points x = 10*rand (1,n); Y = 10*rand (1,n); % randomly generates a set of horizontal ordinate values between (0,10) points, x Y represents the horizontal ordinate plot (x, Y, ' r* '), respectively; % plots the original data point Xlabel (' X '); Ylabel (' Y '); title (' Pre-cluster data points '); n = 2; % divides all data points into two classes of M = 1; % iterations EPS = 1e-7; % Iteration End threshold value U1 = [X (1), Y (1)]; % initializes the first cluster center U2 = [X (2), Y (2)]; % Initializes a second cluster center U1 = zeros (2,100); U2 = zeros (2,100); %U1,U2 is used to store the transverse ordinate U1 (:, 2) = U1 for each iteration of two cluster centers; U2 (:, 2) = U2;d = Zeros (2,n); % initialize data points distance to the cluster center while (ABS (U1 (1,M)-U1 (1,m+1)) > EPS | | abs (U1 (2,M)-U1 (2,m+1) > EPS | | abs (U2 (1,M)-U2 (1,m+1)) ; EPS | |    ABS (U2 (2,M)-U2 (2,m+1)) > EPS)) m = M +1; % calculates the distance of all points to two cluster Centers for i = 1:n d (1,i) = sqrt ((X (i)-U1 (1,m)) ^2 + (Y (i)-U1 (2,m)) ^2); ENDfor i = 1:n d (2,i) = sqrt ((X (i)-U2 (1,m)) ^2 + (Y (i)-U2 (2,m)) ^2); Enda = zeros (2,n); % A is used to store the first class of data points B = zeros (2,n);     % B holds data points for the second class for k = 1:n [Min,index] = MIN (D (:, k));        If index = = 1 point belongs to the first cluster center a (1,k) = X (k);    A (2,k) = Y (k);        Else% point belongs to the second cluster Center B (1,k) = X (k); B(2,k) = Y (k); Endendindexa = Find (A (1,:) ~= 0);% finds points in the first class Indexb = Find (B (1,:) ~= 0);% finds points in the second class U1 (1,m+1) = Mean (A (1,indexa)); U1 (2,m+1) = Mean (A (2,indexa)); U2 (1,m+1) = Mean (B (1,INDEXB)); U2 (2,m+1) = Mean (B (2,INDEXB)); % update two cluster centers Endfigure;plot (A (1,indexa), A (2,indexa), ' *b '); % make the first class point graphics hold Onplot (b (1,INDEXB), B (2,INDEXB), ' Oy '); % make the graphic hold of the second class point oncenterx = [U1 (1,m) U2 (1,m)];centery = [U1 (2,m) U2 (2,m)];p lot (CenterX, centery, ' +g '); % draws two cluster center points xlabel (' X '); Ylabel (' Y '); title (' Data points after clustering ');d ISP ([' Iteration number: ', NUM2STR (m)]);

The resulting classification results are as follows:

50 randomly generated points can be divided into two types of iterations only 4 steps, from the point of view, the effect of classification is good. However, the results of each run may be different because the points are randomly generated and there is no clear classification criteria.

MATLAB implementation of K-mean clustering algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.