Spectral clustering, SC)

Source: Internet
Author: User

Spectral clustering (SC) is a graph-based clustering method. It divides an undirected graph into two or more optimal subgraphs, so that the subgraphs are as similar as possible, the distance between subgraphs should be as far as possible to achieve the goal of common clustering. The optimum refers to the difference in the optimal target function, which can be the minimum cut edge-smallest cut of 1 (for example, min cut in the following text ), it can also be the best cut (for example, the normalized cut in the following text), which is about the same size and the smallest cut edge ).

Figure 1 spectral clustering undirected graph partitioning-smallest cut and best cut

In this way, spectral clustering can identify any shape of sample space and converge to the global optimal solution. Its basic idea is to use sample data.Similarity Matrix (Laplace matrix)Feature vectors obtained after feature decomposition are used for clustering.

1. Theoretical Basis

For the following spatial vector item-user matrix:

If we want to clustering items, we often think of the K-means clustering method. The complexity is O (tknm), and T is the number of iterations, K is the number of classes, n is the number of items, and m is the number of spatial vector features:

1 What if M is large enough?

2 k selection?

Are the three types of assumptions convex and spherical?

4 What if item is a different entity?

5 What is the inevitable local optimal convergence of kmeans?

......

These make common clustering problems quite complex.

1.1 representation of the image

If we calculate the similarity between items, we can obtain a similarity Matrix with only items. Furthermore, we regard items as vertex (V) in graph (g ), the similarity between songs is regarded as edge (e) in g, so that we can get the concept of common graphs. (Supplement, similarity Matrix: Set A and B to n-order matrices. If n-order reversible matrices P exists, P ^ (-1) * A * P = B is true, the matrix A is similar to B, and is recorded as ~ B .)

For Graph Representation (2), commonly used:

Adjacent matrix: E, eij indicates the Edge Weight of VI and VI. E is a symmetric matrix, and the elements on the diagonal line are 0, 2-2.

(Supplement: adjacency matrix): a matrix that represents the adjacent relationship between vertices. Set G = (V, E) to a graph, where V = {V1, V2 ,..., Vn }. The adjacent matrix of G is an n-order matrix with the following properties:

① For an undirected graph, the adjacent matrix must be symmetric and the diagonal line must be zero (only a non-directed simple graph is discussed here). This is not necessarily the case for a directed graph. ② In an undirected graph, the degree of any vertex I is the sum of all elements in column I. In a directed graph, the degree of vertex I is the sum of all elements in row I, the inbound level is the sum of all elements in column I. ③ It takes n ^ 2 spaces to represent the graph by using the adjacent matrix method. Because the undirected graph's Adjacent matrix must have a symmetric relationship, the diagonal line is deducted from zero, you only need to store the data of the top or bottom triangles. Therefore, you only need n (n-1)/2 spaces .)

Laplacian Matrix: L = d-e, where di (sum of row or column elements), 2-3. (Supplement, Laplacian Matrix: L = D-A, D is the degree matrix of the graph, and a is the adjacent matrix of the graph .)

Figure 2 Representation

1.2 feature value and l Matrix

First, we consider an optimal image segmentation method. Taking binary as an example, we divide the image cut into two parts: S and T, which are equivalent to the following loss function cut (S, T), as shown in Formula 1, that is, least (weighted sum of the cut edge ).

Assume that two classes are divided into two categories: S and T. Q (as shown in formula 2) is used to represent the classification condition. Q satisfies the relationship of Formula 3 and is used for class identification.

So:

Where D is the diagonal matrix, the sum of row or column elements, and L is the Laplace matrix.

By:

Include:

1. L is a symmetric semi-Definite Matrix, ensuring that all feature values are greater than or equal to 0;

2. The L matrix has a unique 0 feature value, and its corresponding feature vector is1.

It is very difficult to solve the problem in Discrete mode. If the problem is relaxed to a continuous real value, we know from the properties of the entropy. Second, the minimum value of your model is the characteristic values of L (minimum value, the second small value ,......, the maximum value corresponds to the smallest feature value of matrix L, the second smallest feature value ,......, for the maximum feature value, and the feature vectors corresponding to the extreme Q are obtained, see the repex entropy (Rayquotient)).

To write this, I have to pay tribute to the mathematicians who cleverly converted cut (S, T) into the feature value (vector) of the Laplace matrix, and solved the discrete clustering problem, relaxation is a continuous feature vector. The smallest series of feature vectors correspond to the optimal series of graph partitioning methods. The rest is only the discretization of the relaxed problem. After feature vectors are further divided, the corresponding categories can be obtained. For example, the minimum feature vector in Figure 3 is divided by positive and negative values, classes {a, B, c} and classes {d, e, f, g} are obtained }. When K is classified, kmeans is used to classify the first K feature vectors.

PS:

1. Although kmeans is mentioned again here, its meaning is far from the kmeans discussed when the concept is introduced. Here, kmeans is more related to ensemble learning and will not be mentioned here;

2,K is not required to be the same as the number of clusters. It can be considered from the physical meaning of section 4th;

3. In the first K feature vectors, the values in the first column are identical (when the feature vectors are computed by iterative algorithms, the values are extremely similar), and can be deleted by kmeans, at the same time, this column can be used to easily determine whether the method for solving the feature value (vector) is correct. The problem is that the adjacent matrix is asymmetrical.

Figure 3 feature values and feature vectors of the L Matrix

2. Optimization Method

In other clustering methods such as kmeans, it is difficult to describe the class size relationship, and the local optimal solution is unavoidable. Of course, this is in conflict with the widespread use of kmeans-the principle is simple.

2.1 min cut method

For example, for the Calculation Method in section 2.2, the optimal target function is shown in the following graph cut method:

The calculation method can be solved directly by calculating the smallest feature value (feature vector) of L.

2.2 nomarlized Cut Method

The goal of normarlized cut is to minimize the cut edge and divide the balance at the same time, so as to avoid a separate H like cut in Figure 1. The standard for measuring the size of a quantum graph is the sum of degree of each vertex of a subgraph.

2.3 ratio Cut Method

The goal of ratio cut is to minimize the cut side and divide the balance at the same time, so as to avoid a separate H from the cut in Figure 1.

The optimal objective function is:

2.4 normalized Similarity Transformation

The normalized l matrix includes:

Therefore, the minimum feature value of l' corresponds to the maximum feature value of D-(1/2) e d-(1/2.

The calculated l' is slightly more advantageous than the calculated L. In practice, L is often replaced by l', but min cut and ratio cut are not allowed.

PS: This is also often in people's blogs. A says that spectral clustering is to find the maximum K feature value (vector), B said that spectral clustering is to find the minimum K feature values (vector reasons ).

3. spectral clustering steps

Step 1: Prepare data and generate the graph's Adjacent matrix;

Step 2: normalize the plasma matrix;

Step 3: generate the smallest k feature values and corresponding feature vectors;

Step 4: kmeans clustering (a small number of feature vectors );

Physical significance of 4-spectrum Clustering

Matrix in spectral clustering:

It can be seen that both L and l are highly correlated with E. If we regard e as a high-dimensional vector space, it can also reflect the relationship between items to a certain extent. The direct kmeans clustering of e can also reflect the clustering characteristics of v. The introduction of L and l' of spectral clustering makes the division of G physically significant.

Furthermore, if item (n) of E is large enough, it will be difficult to calculate its kmeans. We can use PCA to reduce the dimension (still top feature value and vector ).

The above regard regards e as a vector space matrix, which intuitively conforms to our cognition, but lacks the theoretical basis. The introduction of L (L ', etc.), as described in section 2nd, the computation has a theoretical basis. The first K feature vectors are also equivalent to Dimensionality Reduction of L (L.

Therefore, clustering is a theoretical basis for graph Division to achieve dimensionality reduction.

Many of the figures are derived from the mining of massive datasets. I would like to thank my colleagues for their preaching and teaching.

Recommended documents: Wen-yen Chen, yangqiu song, Hongjie Bai, Chih-Jen Lin, Edward y. Chang. Parallel spectral clustering in distributed systems.

Recommended source code: https://code.google.com/p/pspectralclustering/ (really like)

Spectral clustering, SC)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.