a clustering algorithm only needs to know how to calculate the similarity degree can be
K-Means (K-means) Clustering algorithm: the algorithm can find k different clusters, and the center of each cluster is calculated by means of the mean value placed in the cluster.
Hierarchical Clustering algorithm
①birch algorithm : Combined with hierarchical clustering algorithm and iterative relocation method, first use bottom-up hierarchical algorithm, then use iterative relocation to improve the effect.
②dbscan algorithm : Density-based clustering method with noise
③cure algorithm : Select the intermediate strategy based on centroid and based on representative object methods. Instead of representing a cluster with a single centroid or object, it selects a fixed number of representative points in the data space. Each cluster has more than one representative point that allows the CURE to adapt to non-spherical geometries. The contraction or condensation of clusters can help to control the effects of outliers. As a result, CURE is better at isolating outliers and is able to identify clusters of non-spherical and large-sized variations.
The disadvantage of K-means clustering algorithm is that the result is not the global optimal, and the convergence speed of large scale data is slow.
the work flow of the K-means algorithm : a bunch of data, select the K initial point as the centroid, for each point in the dataset, find its nearest centroid, assign it to the cluster that the centroid belongs to. Finally, the centroid of each cluster is updated to the average of all points in the cluster. (The process continues to iterate)
This article from "Shangwei Super" blog, declined reprint!
--------K-means clustering algorithm for machine learning in practical intensive reading