Original Address
Spectral clustering (spectral clustering, SC) is a clustering method based on graph theory--dividing the weighted non-direction graph into two or more optimal sub-graphs, making the interior of the sub-chart as similar as possible, and the distance between the sub-graphs as far as possible, in order to achieve the common purpose of clustering. The best is that the optimal objective function is different, can be cut edge of the minimum division-as shown in Figure 1 smallest cut (such as the next min cut), can also be divided by the same size and cut edge of the smallest split--as shown in Figure 1 of the best cut (such as the following normalized cut).
Fig. 1 Spectral cluster non-direction graph division--smallest cut and Best cut
In this way, the spectral clustering can recognize the arbitrary shape of the sample space and converge to the global optimal solution, the basic idea is to use the similarity matrix (Laplace matrix) of the sample data to obtain the characteristic vector clustering. 1 Theoretical basis
For the following space vector item-user matrix:
If the item is to be clustered, it is often thought of the K-means clustering method, the complexity is O (TKNM), T is the number of iterations, K is the number of classes, n is the item number, and M is the space vector feature number:
1 if M is big enough.
2 k's selection.
The assumptions of the 3 class are convex spheres.
4 If item is a different entity.
5 Kmeans is an unavoidable local optimal convergence.
......
All of these make common clustering problems quite complex. 1.1 Representation of graphs
If we calculate the similarity between item and item, we can get a similar matrix with item only, further, think of item as vertex (V) in graph (g), the similarity between songs is considered as Edge (E) in G, so we get the concept of common graph.
For the representation of diagrams (Figure 2), it is commonly used:
Adjacency Matrix: E,eij represents the weights of the sides of VI and VI, E is a symmetric matrix, and the element on the diagonal is 0, as shown in Figure 2-2.
Laplacian matrix: L = D–e, where di (row or column element's and), as shown in Figure 2-3.
Figure 2 Figure representation of 1.2 eigenvalues with L-matrices
First of all, consider an optimal image segmentation method, in two examples, the graph cut is S and T two parts, equivalent to the following loss function cut (S, T), as shown in Equation 1, that is, the smallest (the weighted sum of chopped edges).
Suppose two is divided into two classes, S and T, with Q (as shown in Equation 2) to represent the classification, and Q satisfies the relationship of Equation 3 for class identification.
So:
where d is the diagonal matrix, the same as the row or column element, and L is the Laplace matrix.
By:
Yes:
1, L is a symmetric semi-positive definite matrix, to ensure that all eigenvalues are greater than or equal to 0;
2, L Matrix has a unique 0 eigenvalue, its corresponding eigenvector is 1.
The discrete solution q is very difficult, if the problem is relaxed into continuous real value, by the nature of Rayleigh entropy and the other will be the minimum value of your type is the eigenvalues of L (minimum value, the second small value, ...), the maximum value corresponds to the minimum eigenvalue of the Matrix L, the second small eigenvalues, ..., the maximum eigenvalue, And the extremum Q is obtained at the corresponding eigenvector, see Rayleigh entropy (Rayleigh quotient).
Writing this, we have to pay tribute to the mathematicians, will cut (s,t), cleverly transform the Chenglapuras matrix eigenvalue (vector) problem, the discrete clustering problem, relaxation as a continuous feature vector, the smallest series of eigenvectors corresponding to the best series division method. The rest is only the relaxation of the problem of discretization, the feature vector will be divided again, you can get the corresponding categories, such as the smallest feature vector in Figure 3, by the positive and negative division, the class {A,b,c} and class {D,e,f,g}. In K classification, the former K eigenvector is often used in Kmeans classification.
Ps:
1, here, although again mentioned Kmeans, but the meaning is far from the introduction of the concept of the discussion of the Kmeans, here Kmeans, more is related to ensemble learning, not described here;
2, K and the number of clusters are not the same requirements, from the 4th section of the relevant physical meaning of the meeting;
3, in the former K eigenvector, the first column value is exactly the same (the iterative algorithm calculates the eigenvector, the value is very similar), Kmeans can be deleted, but also through this column to easily determine whether the eigenvalue (vector) method is correct, often the problem is the adjacency matrix asymmetry.
Fig. 3 The eigenvalues of the L matrix and the eigenvector 2 optimization method
In other clustering methods such as Kmeans, it is difficult to carve out the size relation of the class, and the local optimal solution is also an unavoidable leaky disease. This is, of course, a repulsion to the widespread use of Kmeans-a simple principle. 2.1 Min Cut method
such as the 2.2 section of the calculation method, the optimal objective function is the following diagram cut method:
The calculation method can be solved directly by calculating the minimum eigenvalue (eigenvector) of L. 2.2 nomarlized Cut method
Normarlized Cut, the goal is to consider minimizing cut edges and dividing the balance so as to avoid a separate h as cut out in Figure 1. The criterion for measuring the size of a sub-graph is the sum of the degree of each endpoint of the sub-graph.
2.3 Ratio Cut Method
The goal of Ratio cut is to consider minimizing cut edges and dividing the balance so as to avoid a separate h in the cut out of Figure 1.
The optimal objective function is:
2.4 Normalized similarity transformation
The normalized l matrices are:
Thus the minimum eigenvalue of L ' corresponds to the maximum eigenvalue of D-(1/2) E D (a).
The calculation of L ' compared to the calculation of L to a slight advantage, in practical, often with l ' instead of L, but min cut and ratio cut cannot be.
PS: This is also often in people's blogs, a-spectrum clustering for the maximum K eigenvalues (vectors), B-spectrum clustering for the minimum k eigenvalues (the cause of the vector). 3 Spectral Clustering steps
The first step: Data preparation, generating graph adjacency matrix;
The second step: normalization of the Matrix;
The third step: generate the smallest k eigenvalues and corresponding eigenvectors;
The fourth step: the eigenvector Kmeans Clustering (a small number of eigenvectors); the physical meaning of 4 spectral clustering
Matrices in spectral clustering:
Visible whether L, l ' are associated with e particularly large. If E is regarded as a high-dimensional vector space, it can also reflect the relationship between item to some extent. The results of e Direct Kmeans clustering can also reflect the clustering characteristics of V, and the introduction of L and L ' to the spectral clustering has the physical meaning of the division of G.
Moreover, if E's item (that is, N) is large enough, it will be difficult to calculate its kmeans, we can use PCA to reduce the dimension (still the eigenvalues and vectors of top).
The above pairs of e as a vector space matrix, intuitively see in line with our cognition, but the lack of theoretical basis, and the introduction of L (l ' etc), as described in section 2nd, so that the calculation has a theoretical basis, its first k eigenvector, is also equivalent to L (l ', etc.) of the dimensionality reduction.
So clustering is to find a theoretical basis for the division of the graph, can achieve the purpose of dimensionality reduction.
Many of these figures originate from mining of Massive Datasets, and thank the colleagues for their preaching and tuition.
Recommended related documents: Wen-yen Chen, Yangqiu Song, Hongjie Bai, Chih-jen Lin, Edward Y Chang. Parallel spectral clustering in distributed Systems.
Recommended related source: https://code.google.com/p/pspectralclustering/(really great)
For more extensions, see the following blog post: Spectral clustering algorithm (spectral clustering) optimization and expansion.
------can only always regard the hard work as the necessity of life, even if there is no hope of harvest, can continue to cultivate calmly.