Understanding Spectral Clustering

Last Update:2014-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Understanding Spectral Clustering

Previously introduced K-means clustering method, this method is simple and easy to understand, mainly in how to define distance calculation formula (generally use Euclidean distance), how to choose K Value, these two problems. This time we introduce spectral clustering, which is an upgraded version of K-means. We plan to introduce spectral clustering in a number of ways: what are the drawbacks of K-MEASN clustering? The basic idea of spectral clustering and the algorithm steps of spectral clustering.

So what's the problem with K-means? Why do we need to improve it?

The calculation of K-means is difficult when the number of sample dimensions increases. Because in K-means, the input calculates the primitive vectors in the Euclidean space;
K-means obtained is a local optimal clustering strategy, SSE is not necessarily the smallest;

1. Basic idea of spectral clustering

spectral clustering is a clustering method based on graph theory . The sample is considered a vertex, and the similarity of the sample is regarded as the weighted edge. Thus, the process of dividing the sample set into K-clusters is equivalent to a graph segmentation problem.

As shown, each vertex is a sample, a total of 7 samples (the actual sample is a eigenvector). The weight of the edge is the similarity between the samples. Then we need to split, after the split to make the connection between the different groups of the weight of the edge as low as possible (small group similarity), the inner edge of the group as high as possible (high similarity in the group).

For example, if we remove the middle side, we can satisfy the similarity between the clusters above, and the similarity of the clusters is the most. Such as:

2. Algorithm steps

The spectral clustering algorithm mainly has the following steps:

1) The adjacency matrix E of the graph and the Laplace matrix L are calculated.

Given the original characteristics of the sample, we need to calculate the similarity value between the 22 samples in order to construct the adjacency matrix E. We generally use Gaussian kernel functions to calculate similarity. The formula is as follows:

Where Xi and XJ are the original eigenvectors of two samples, we can calculate the correlation between them. Note that the diagonal elements of the adjacency matrix are equal to zero. So we get the adjacency matrix E.

Laplace Matrix L=d-e, where D is the degree matrix of the above graph, the degree is the concept in graph theory, that is, the sum of the elements of the matrix E row or column. We're going to do the research based on the Matrix L later.

2) Cluster classification criteria

The problem of dividing criterion is the core of clustering technology. Spectral clustering is regarded as a problem in the field of graph theory, and the division of the spectrum is related to the weight of the edge.

guideline 1: Minimizes the sum of the weights of the edges that are removed when splitting.

This criterion visually minimizes the similarity between clusters after segmentation. One problem, however, is that this criterion does not take into account the size of the cluster, and it is easy to create a sample that can be divided into clusters. To avoid this problem, the following guidelines are available.

Guideline 2: Consider the size of the cluster.

This guideline 2 is comparable to the improvement in guideline 1, similar to the gain ratio in C4.5 and the improvement of ID3 information gain. In Ratiocut, if a group contains fewer vertices, the value is greater. In a minimized problem, this is tantamount to punishment, which is to discourage the component from being too small. Now as long as the minimization of the Ratiocut solution, the segmentation is complete. Unfortunately, this is a NP-hard problem. To solve this problem in polynomial time, a transformation is necessary. In the process of conversion, the use of the above mentioned in the group of the nature of the L, after a number of derivation, finally can get such a problem:

This problem and the linear discriminant analysis of the objective function conversion process in LDA are similar. This problem can be transformed into feature decomposition, we do feature decomposition of the Laplace Matrix L, take the first k the smallest eigenvalues corresponding to the eigenvector, these K eigenvectors constitute a new sample feature matrix, the dimension of N*k.

Next, each line of our N*k matrix is considered a vector in K-dimensional space, where each sample is now only k-dimensional, and each row is treated as a point for K-means clustering. Clustering gets which cluster the row belongs to, then the original sample belongs to that cluster.

Therefore, the clustering process of spectral clustering requires only the similarity matrix between the data, and the K-means clustering is clustering with the original eigenvector of the sample, which is different from each other.

Understanding Spectral Clustering

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Understanding Spectral Clustering

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Understanding Spectral Clustering

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support