1. kmeans++
Kmeans is sensitive to the initialization of the clustering center, the different initial values bring different clustering results, because the Kmeans is simply the approximate optimal solution to the objective function, and the global optimal solution cannot be guaranteed.
In a regular Kmeans, the initialization of the cluster center takes the form of random initialization, and there is a problem: if the data is denser in a certain part, the resulting random number is closer to the data at a higher probability.
For example, suppose the input data is:
[0.8,0.85,0.9,0.95,1,1.05,1.1,1.2,3.0,3.1,3.2], [0.8, 0.85, 0.9, 0.95, 1, 1.05, 1.1, 1.2, 3.0, 3.1, 3.2],
As shown in the following illustration:
If two cluster centers are randomly initialized, the two cluster centers will be closer to the data near 1.0 at a higher probability, that is, it is likely to occur that the randomly initialized two cluster centers are near the cluster 1.0, which results in a poor clustering effect.
To solve the above problem, David Arthur proposed the kmeans++ algorithm, which can effectively produce a good clustering center, the process is as follows: Randomly select a point from the set of input data points as the first cluster Center for each point x in the dataset, calculate it with the nearest cluster center ( The distance from the selected cluster center) d (x) Select a new data point as the new cluster Center, the principle of selection is: D (x) The larger point, the probability of being selected as the center of the cluster large repetition of 2 and 3 until K-cluster centers are chosen to run the standard K-means algorithm using this k initial cluster center
Where the 3rd step is achieved, first take a random value that can fall in sum (d (x)), and then use random-= d (x) until its <=0, at which point is the next "seed point".
Assuming that the first cluster Center is 1, then the distance from all input data to the cluster center is D:
D (x) =[0.2,0.15,0.1,0.05,0,0.05,0.1,0.2,2.,2.1,2.2] D (x) = [0.2, 0.15, 0.1, 0.05, 0, 0.05, 0.1, 0.2, 2., 2.1, 2.2]
For the D prefix and, get d1,d1 as:
D1=[0.2,0.35,0.45,0.5,0.5,0.55,0.65,0.85,2.85,4.95,7.15] D1 = [0.2, 0.35, 0.45, 0.5, 0.5, 0.55, 0.65, 0.85, 2.85, 4.95, 7. 15]
As shown in the following illustration:
At this time a random number random∈ (0, 1) and multiplied by D1 (end) = 7.15, then the probability of the number greater than or equal to 0.85 is very large, assuming 4, then the 10th number, 4.95 is the first number greater than 4, the input data in the 10th, that is, 3.1 as For the new cluster center.
The significance of the above process is to allow the new cluster centers to move away from the existing cluster centers with greater probability to avoid clustering centers being distributed in a high probability in the data-dense part. 2. Dictionary Learning
Refer to "Learning Feature representations with K-means". 2.1 First step: Image chunking, as a sample
From the input image to intercept the image block (can have overlapping), the size of the block can be 8x8,16x16, etc., the size of the block determines the dimensions of the sample, in order to get a better dictionary, the larger the block, the more training samples needed. For a 16x16 block, 100 000 samples are sufficient. Then pull each block into one dimension and combine all the samples to form a matrix:
X=[x1,x2,xi,..., xn] x=[x_1, x_2, x_i, ..., X_n]
where Xi=[xi1,xi2,..., xim]t x_i = [X_{i1}, X_{i2}, ..., X_{im}]^t,m is the dimension of the sample, that is, the size of the image block, n is the number of samples. 2.2 Normalization of inputs
The data is generally normalized, and the normalization process is shown in the following format:
X (i) =x~ (i) −mean (x~ (i)) var (x~ (i))