The principles of clustering and classification in data mining are widely used.
Clustering means unsupervised learning.
Classification means supervised learning.
Generally speaking, clustering is classified as unknown samples, but is classified as similar classes based on the similarity of samples.
When classification is a known sample classification, the sample features and classification features must be matched to classify each sample into a specific class given.
This article describes the K-means algorithm in clustering algorithms.
Clustering algorithms include multiple types and can be allocated as follows:
1. Division: clustering algorithms based on this idea include K-means, Pam, Clara, CLARANS, and stirr.
2. Hierarchy Method: clustering algorithms based on this idea include Birch, cure, rock, and chamlean.
3. density method: the clustering algorithms based on this idea include DBSCAN, optics, denclue, fdbscan, and incremental DBSCAN.
4. Grid Method: clustering algorithms based on this idea include sting, wavecluster, and optigrid.
5. Model Method: clustering algorithms based on this idea include autoclass, Cobweb, and classit.
6. Neural Network: There are two clustering algorithms based on the idea network: one is self-organizing Feature ing.
Second, competitive learning
K-means is based on the Division idea. So here we will introduce the Division clustering idea:
1. For a group of sample data, first K cluster centers are randomly identified.
2. Later, the clustering center was changed through repeated iterations to make continuous optimization. The meaning of continuous optimization is:
The same class of samples is getting closer and closer to the cluster center, while the distance between different classes of samples is getting farther and farther.
The locations that eventually converge to the cluster center are no longer moved.
Since K-means is based on this division idea, the essence of K-means algorithm is consistent with that of Division.
The K-means algorithm is as follows:
1. Set the sample to X {X (1), X (2 )........}
2. First, K cluster centers are randomly selected in the sample.
3. Calculate the distance from the sample points out of the cluster center to each cluster center.
Samples are classified into the sample points closest to the sample center. This achieves the initial clustering.
4. Re-calculate the cluster center for each class, and re-calculate the distance from the sample points in the out-of-cluster center to the three cluster centers.
Samples are classified into the sample points closest to the sample center, which realizes the first Optimization of clustering.
5. Repeat Step 4 until the location of the cluster center does not change twice, which completes the final clustering.
K-means MATLAB implementation: (k = 3)
CLC; clear; clomstatic = [,]; Len = length (clomstatic); % evaluate the length of the vector clomstatic K = 3; % The number of given classes % generates three random integers. the random cluster center P = randperm (LEN); temp = P (1: K); Center = zeros (1, k ); for I = 1: K Center (I) = clomstatic (temp (I); end % calculates the distance from the sample data except the cluster center to the cluster center, then perform the clustering tempdistance = zeros (Len, 3); While 1 circulm = 1; P1 = 1; P2 = 1; P3 = 1; judgeequal = zeros (1, k ); if (circulm ~ = 1) Clear group1 group2 Group3; end for I = 1: Len for J = 1:3 tempdistance (I, j) = ABS (clomstatic (I)-center (j )); end [rowmin rowindex] = min (tempdistance (I, :)); If (rowindex = 1) group1 (P1) = clomstatic (I); P1 = p1 + 1; elseif (rowindex = 2) group2 (P2) = clomstatic (I); P2 = P2 + 1; elseif (rowindex = 3) Group3 (P3) = clomstatic (I ); p3 = P3 + 1; end len1 = length (group1); len2 = length (group2); len3 = length (Group3); % calculate group1, group2, mean meangroup1 = mean (group1); meangroup2 = mean (group2); meangroup3 = mean (Group3 ); % calculate the point closest to the mean in each class as the new cluster center absgroup1 = zeros (1, len1); for t = 1: len1 absgroup1 (t) = floor (ABS (group1 (t)-meangroup1); end [maxabsgroup1 maxabsgroup1index] = min (absgroup1); newcenter (1) = group1 (maxabsgroup1index); clear absgroup1; absgroup2 = zeros (1, len2); for t = 1: len2 absgroup2 (t) = floor (ABS (group2 (t)-meangroup2 )); end [maxabsgroup2 maxabsgroup2index] = min (absgroup2); newcenter (2) = group2 (maxabsgroup2index); clear absgroup2; absgroup3 = zeros (1, len3); for t = 1: len3 absgroup3 (t) = floor (ABS (Group3 (t)-meangroup3); end [maxabsgroup3 maxabsgroup3index] = min (absgroup3); newcenter (3) = Group3 (maxabsgroup2index ); clear absgroup3; % determine whether the clustering centers of the New and Old classes are different. If they are different, continue clustering. Otherwise, the cluster ends with judgeequal = zeros (1, k); for I = 1: k judgeequal = (newcenter = center); End S = 0; for I = 1: K if (judgeequal (I) = 1) S = S + 1; end end if (S = 3) break; end circulm = circulm + 1; End
The cluster result is as follows:
It is not until it is a problem of algorithm convergence or code itself. After running, it will continue to run. Press Ctrl + C to interrupt the code, and the clustering result will come out.
If you find the cause, please kindly advise me. Thank you a lot.
For more information, see Xiao Liu.
MATLAB Implementation of K-means clustering algorithm