K-means algorithm is the most classical clustering method based on partition, and it is one of the ten classical data mining algorithms.
The basic idea of the K-means algorithm is to classify the objects closest to them by clustering the K points in the space as a center. Through iterative method, the values of each cluster center are updated successively until the best clustering results are obtained. MATLAB has Kmeans clustering algorithm functions can be called, such as [Ldx,c,sumd,d]=kmeans (x,k) (see matlab help Kmeans). The following link is a cluster demo K-means demo written in Java.
For the disadvantage of this algorithm, you can use Isodata (iterative self-organizing Data analysis algorithm) to determine the number of clusters K, using the k-means++ algorithm or genetic algorithm (GA) to select the initial center.
K-means Algorithm Flow:
1. Initialize: Select the appropriate method to set the K Initial Code Center zi,1<=i<=k;
2. Nearest Neighbor classification: the training data Vector XT is assigned to the nearest codebook Zi according to the closest neighbor Principle , which can be used in European distance, Markov distance and so on.
3. Codebook Update: Assign all training data to the nearest codebook and generate a new centroid, the new codebook.
4. Repeat 2, 3 until the adjacent iteration error satisfies the threshold requirement.
Isodata algorithm:
Isodata (self-organizing analysis), using the mechanism of merging and splitting by setting the initial parameters. When the center distance of a two clustering class is less than a certain threshold, they are combined into one class, and when a standard deviation is greater than a certain threshold or the number of samples exceeds a threshold value, it is divided into two categories, which need to be canceled when the number of samples in a class is less than a certain threshold value.
k-means++ Algorithm Flow:
1. Set d for a given initial point
2. Randomly select a point from the point set D as the initial center point
3. Calculate the distance from each point to the nearest center point Si
4. Sum si to get sum
5. Random values (0 < random < Sum)
6. Loop Point Set D, do random-= Si (updated si) operation until random < 0, then point I is the next center point
7. Loop 3-6 until all K center points are removed
8. Perform the K-means algorithm
Reference: K-means algorithm
Understanding of the kmeans++ algorithm
Comparison of Kmeans, kmeans++ and KNN algorithms
Introduction to K-means Vector quantization algorithm