1.k-means: Clustering algorithm, unsupervised input: K, Data[n], (1) Select K Initial center point, e.g. C[0]=data[0],... c[k-1]=data[k-1], (2) for Data[0]....data[n], respectively and c[0]...c[ K-1] comparison, assuming that the difference with C[i] is the least, it is marked as I, (3) for all marks as I, recalculate c[i]={all data[j labeled i) and}/marked as the number of I, (4) Repeat (2) (3), until all the c[i] value change is less than the given threshold value. Advantages: Simple, fast, disadvantage: clustering results and the initial center selection, must provide the number of clusters (k value), the general practice is to use different k values multiple clusters, the best results. 2.KNN: Classification algorithm, supervised calculation steps are as follows:
1) Given the test object, calculate its distance from each object in the training set
2) Locate the nearest K training object as the nearest neighbor of the test object
3) According to the main category of K nearest neighbor attribution, the advantages of classifying the test object are: Simple, no parameter estimation, no training disadvantage: large computational capacity, large memory overhead
K-means, KNN Study notes