1. Classification:
Clustering (clustering) belongs to unsupervised learning (unsupervised learning)
No category tag (class label) 2. For example:
3. K-means algorithm:
3.1 Clustering, one of the ten classical algorithms in data mining.
The 3.2 algorithm accepts the parameter k, and then divides the N data objects into K clusters to satisfy the obtained clusters: the similarity of objects in the same cluster is higher, while the similarity of objects in different clusters is small.
3.3 Algorithm Idea:
Clustering is centered around K points in space, classifying the objects closest to them. Through iterative method, the values of each cluster center are updated successively until the best clustering results are obtained.
3.4 Algorithm Description:
(1) Appropriate selection of the initial center of Class C;
(2) in the K iteration, for any sample, the distance to the center of C, the sample is classified to the shortest distance of the center of the class;
(3) The center value of the class is updated by means of mean value.
(4) for all C cluster centers, if the value is unchanged after using the iteration method (2) (3), the iteration ends, or the iteration continues.
3.5 Algorithm Flow:
Input: K, data[n];
(1) Select K Initial center point, for example c[0]=data[0],... c[k-1]=data[k-1];
(2) for Data[0]....data[n], respectively, compared with c[0]...c[k-1], assuming that the difference between c[i] is the least, it is labeled I;
(3) For all marks as I, recalculate c[i]={all Data[j] labeled I, and}/marked as number of I;
(4) Repeat (2) (3) until all C[i] values change less than the given threshold value. 4. For example:
Advantages: fast speed, simple
Disadvantage: The final result is related to the initial point selection, it is easy to get into the local optimal, until the K value
K-means Reference Download Address