Understanding Spectral Clustering

Source: Internet
Author: User

Understanding Spectral Clustering

Previously introduced K-means clustering method, this method is simple and easy to understand, mainly in how to define distance calculation formula (generally use Euclidean distance), how to choose K Value, these two problems. This time we introduce spectral clustering, which is an upgraded version of K-means. We plan to introduce spectral clustering in a number of ways: what are the drawbacks of K-MEASN clustering? The basic idea of spectral clustering and the algorithm steps of spectral clustering.


So what's the problem with K-means? Why do we need to improve it?

    • The calculation of K-means is difficult when the number of sample dimensions increases. Because in K-means, the input calculates the primitive vectors in the Euclidean space;
    • K-means obtained is a local optimal clustering strategy, SSE is not necessarily the smallest;


1. Basic idea of spectral clustering


spectral clustering is a clustering method based on graph theory . The sample is considered a vertex, and the similarity of the sample is regarded as the weighted edge. Thus, the process of dividing the sample set into K-clusters is equivalent to a graph segmentation problem.

As shown, each vertex is a sample, a total of 7 samples (the actual sample is a eigenvector). The weight of the edge is the similarity between the samples. Then we need to split, after the split to make the connection between the different groups of the weight of the edge as low as possible (small group similarity), the inner edge of the group as high as possible (high similarity in the group).


For example, if we remove the middle side, we can satisfy the similarity between the clusters above, and the similarity of the clusters is the most. Such as:



2. Algorithm steps


The spectral clustering algorithm mainly has the following steps:


1) The adjacency matrix E of the graph and the Laplace matrix L are calculated.


Given the original characteristics of the sample, we need to calculate the similarity value between the 22 samples in order to construct the adjacency matrix E. We generally use Gaussian kernel functions to calculate similarity. The formula is as follows:


Where Xi and XJ are the original eigenvectors of two samples, we can calculate the correlation between them. Note that the diagonal elements of the adjacency matrix are equal to zero. So we get the adjacency matrix E.

Laplace Matrix L=d-e, where D is the degree matrix of the above graph, the degree is the concept in graph theory, that is, the sum of the elements of the matrix E row or column. We're going to do the research based on the Matrix L later.


2) Cluster classification criteria


The problem of dividing criterion is the core of clustering technology. Spectral clustering is regarded as a problem in the field of graph theory, and the division of the spectrum is related to the weight of the edge.


guideline 1: Minimizes the sum of the weights of the edges that are removed when splitting.


This criterion visually minimizes the similarity between clusters after segmentation. One problem, however, is that this criterion does not take into account the size of the cluster, and it is easy to create a sample that can be divided into clusters. To avoid this problem, the following guidelines are available.


Guideline 2: Consider the size of the cluster.



This guideline 2 is comparable to the improvement in guideline 1, similar to the gain ratio in C4.5 and the improvement of ID3 information gain. In Ratiocut, if a group contains fewer vertices, the value is greater. In a minimized problem, this is tantamount to punishment, which is to discourage the component from being too small. Now as long as the minimization of the Ratiocut solution, the segmentation is complete. Unfortunately, this is a NP-hard problem. To solve this problem in polynomial time, a transformation is necessary. In the process of conversion, the use of the above mentioned in the group of the nature of the L, after a number of derivation, finally can get such a problem:



This problem and the linear discriminant analysis of the objective function conversion process in LDA are similar. This problem can be transformed into feature decomposition, we do feature decomposition of the Laplace Matrix L, take the first k the smallest eigenvalues corresponding to the eigenvector, these K eigenvectors constitute a new sample feature matrix, the dimension of N*k.


Next, each line of our N*k matrix is considered a vector in K-dimensional space, where each sample is now only k-dimensional, and each row is treated as a point for K-means clustering. Clustering gets which cluster the row belongs to, then the original sample belongs to that cluster.


Therefore, the clustering process of spectral clustering requires only the similarity matrix between the data, and the K-means clustering is clustering with the original eigenvector of the sample, which is different from each other.





Understanding Spectral Clustering

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.