Kmeans is one of the simplest clustering algorithms, it is widely used, and the basic idea of the Kmeans is to gather the samples into different clusters according to distance, the closer the two points are, the greater the similarity is, to get the compact and independent cluster as the clustering target. This article refer to PRML book, explain the principle of Kmeans clustering and image segmentation application in detail. 1. Basic Principles
A set of data for the given D-dimensional Euclidean space {x1,..., XN} \left\{{{x_1}} X_n},,..., {\right\}} Our task is to cluster this group of data into K K clusters (the difference between clustering and classification is that the classification is supervised and the clustering is unsupervised, According to the task set clustering basis, where the assumption that the number of cluster K is known. Regardless of the background of the problem, simply from the point of view of Euclidean space, we should be closer to the point of a cluster, different clusters of points between the distance. The Kmeans clustering method is to find K K cluster Center μk (k=1,..., k) \mu_k\left (k=1,..., k\right), and allocate all the data to the nearest cluster center, so that each point is the least square sum of its corresponding cluster center distance.
We introduce the binary variable rnk∈{0,1} r_{nk}\in\left\{0,1\right\} to represent the data point xn x_n to the cluster K K attribution (where N=1,..., n n=1,..., N, k=1,..., k k=1,..., k), If the data point xn x_n belong to K K clustering, then rnk=1 r_{nk}=1, otherwise 0 0. In this way, we can define the following loss functions:
J=∑n=1n∑k=1krnk∥∥xn−μk∥∥2 (1) j=\sum\limits_{n=1}^n{\sum\limits_{k=1}^k}{r_{nk}{\left\|x_n-\mu_k\right\|} ^2}\qquad (1)
The goal of this problem is to find the loss function