MATLAB implementation of K-mean clustering algorithm

Last Update:2017-05-14 Source: Internet

Author: User

Tags rand

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

An overview of 1.k-mean-value clustering method

This clustering method was used in the mathematical modeling process, but it was not particularly clear how the toolbox was called in MATLAB for clustering. Recently, because of learning pattern recognition, and re-contact with this clustering algorithm, so carefully studied the principle of it. After understanding the manual with Matlab programming to achieve, the final result is good, hehe ~ ~ ~ ~ ~ ~ K-means clustering is given a set of samples (x1, x2, ... xn) (xi, i = 1, 2, ... n are vectors), assuming that you want to gather it into M (<n) Class can be implemented as follows: Step 1: Randomly select m vectors (y1,y2,... ym) from (x1, x2, ... xn) as the initial cluster center (can be arbitrarily specified, not selected in n vectors); Step 2: Calculate (x1, x2, ... xn) distance to this m cluster center (strictly 2-order norm); Step 3: For each XI (i =,... N) compare its distance to (Y1,y2,... ym), find the minimum value, if the distance to YJ is the smallest, then Xi is classified as Class J; After the step 4:m class is divided, the mean vector of each class is calculated as a new cluster center. Step 5: Compare the distance between the new cluster center and the old cluster center, if it is greater than the set threshold, jump to STEP2; Otherwise the output classification result and the clustering center, the algorithm ends. OK, nonsense not much to say, directly on the MATLAB code.

Using the K-means clustering principle, the classification of a set of data is realized. Here is an example of a two-dimensional set of points. N = 40; The number of% points x = 10*rand (1,n); Y = 10*rand (1,n); % randomly generates a set of horizontal ordinate values between (0,10) points, x Y represents the horizontal ordinate plot (x, Y, ' r* '), respectively; % plots the original data point Xlabel (' X '); Ylabel (' Y '); title (' Pre-cluster data points '); n = 2; % divides all data points into two classes of M = 1; % iterations EPS = 1e-7; % Iteration End threshold value U1 = [X (1), Y (1)]; % initializes the first cluster center U2 = [X (2), Y (2)]; % Initializes a second cluster center U1 = zeros (2,100); U2 = zeros (2,100); %U1,U2 is used to store the transverse ordinate U1 (:, 2) = U1 for each iteration of two cluster centers; U2 (:, 2) = U2;d = Zeros (2,n); % initialize data points distance to the cluster center while (ABS (U1 (1,M)-U1 (1,m+1)) > EPS | | abs (U1 (2,M)-U1 (2,m+1) > EPS | | abs (U2 (1,M)-U2 (1,m+1)) ; EPS | |    ABS (U2 (2,M)-U2 (2,m+1)) > EPS)) m = M +1; % calculates the distance of all points to two cluster Centers for i = 1:n d (1,i) = sqrt ((X (i)-U1 (1,m)) ^2 + (Y (i)-U1 (2,m)) ^2); ENDfor i = 1:n d (2,i) = sqrt ((X (i)-U2 (1,m)) ^2 + (Y (i)-U2 (2,m)) ^2); Enda = zeros (2,n); % A is used to store the first class of data points B = zeros (2,n);     % B holds data points for the second class for k = 1:n [Min,index] = MIN (D (:, k));        If index = = 1 point belongs to the first cluster center a (1,k) = X (k);    A (2,k) = Y (k);        Else% point belongs to the second cluster Center B (1,k) = X (k); B(2,k) = Y (k); Endendindexa = Find (A (1,:) ~= 0);% finds points in the first class Indexb = Find (B (1,:) ~= 0);% finds points in the second class U1 (1,m+1) = Mean (A (1,indexa)); U1 (2,m+1) = Mean (A (2,indexa)); U2 (1,m+1) = Mean (B (1,INDEXB)); U2 (2,m+1) = Mean (B (2,INDEXB)); % update two cluster centers Endfigure;plot (A (1,indexa), A (2,indexa), ' *b '); % make the first class point graphics hold Onplot (b (1,INDEXB), B (2,INDEXB), ' Oy '); % make the graphic hold of the second class point oncenterx = [U1 (1,m) U2 (1,m)];centery = [U1 (2,m) U2 (2,m)];p lot (CenterX, centery, ' +g '); % draws two cluster center points xlabel (' X '); Ylabel (' Y '); title (' Data points after clustering ');d ISP ([' Iteration number: ', NUM2STR (m)]);

The resulting classification results are as follows:

50 randomly generated points can be divided into two types of iterations only 4 steps, from the point of view, the effect of classification is good. However, the results of each run may be different because the points are randomly generated and there is no clear classification criteria.

MATLAB implementation of K-mean clustering algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More