Random Talk Clustering (4): Spectral clustering

Last Update:2015-12-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

If K-means and GMM These clustering method is the ancient popular algorithm, then this time to talk about the spectral clustering can be considered a modern popular algorithm, Chinese is often called "Spectral clustering." Because of the nuances of the matrix used, spectral clustering can actually be said to be a "class" algorithm.

Spectral clustering has many advantages over traditional clustering methods (such as K-means):

Similar to K-medoids, spectral clustering only requires a similarity matrix between the data, rather than requiring the data to be a vector in N dimensional Euclidean's space as K-means.
Because of grasping the main contradiction, ignoring the secondary things, so more robust than the traditional clustering algorithm, for the irregular error data is not so sensitive, and performance is better. Many experiments have proved this. In fact, in the comparison of various modern clustering algorithms, K-means usually exists as a baseline.
Computational complexity is smaller than k-means, especially when running on very high-dimensional data such as text data or trivial image data.

Suddenly out of such a request than k-means less, the result is better than K-means, is more than k-means faster things, it is really let people have to doubt is charlatan AH. So, the Mule is a horse, first pull out to sneak out to say. However, in k-medoids that article has actually run through the k-medoids algorithm, the final result is a accuracy, a number can not be drawn as a chart and so on, it looks really boring, and K-means run up is too slow, So I'm just a little lazy here, just quoting the results of a paper.

The results come from the paper document clustering using locality preserving indexing , this paper is actually a different kind of clustering method (next time if there is a chance), In its experiment, however, there are two sets of data, K-means and spectral clustering, which are extracted as follows:

TD colspan= "2" align= "center" >reuters-21578 Td>3 tr>

k	TDT2
k	k-means	SC	K-means	SC
2	0.989	0.998	0.871	0.923
0.974	0.996	0.775	0.816
4	0.959	0.996	0.732	0.793
...
9	0.852	0.984	0.553	0.625
	0.835	0.979	0.545	0.615

Among them, TDT2 and Reuters-21578 are two widely used standard text datasets, although results obtained on different datasets cannot be compared directly, but on the same data set K-means and SC (spectral clustering) The comparison of the results is at a glance. In the experiment, the data of several categories (from 2 to 10 classes) were extracted, and the accuracy were listed in the table above (I lazy not all listed). Can see, spectral clustering here to win K-means.

Such a powerful algorithm, and then put on the "spectral clustering" such an enigmatic name, if not the model is extremely complex, including the universe, it is certain that the treasure of the Shan, do not preach cheats it? Not really, spectral clustering no matter from the model or the implementation is not complex, only need to be able to find the matrix eigenvalues and eigenvectors can--and this is a very basic operation, any one that is known to provide support for linear algebra operations should have such a function. And about spectral clustering cheats More is full street is, casually from the stall to find a copy, opened can see spectral clustering algorithm of the whole picture:

Based on the data, a graph is constructed, each node of graph corresponds to a data point, a similar point is connected, and the weight of the edge is used to represent the similarity between the data. Put this Graph in the form of an adjacency matrix, denoted as. One of the most lazy ways to do this is to use the similarity matrix directly in front of us in K-medoids.
Add up each column of the elements to get the number, put them on the diagonal (all the other places are 0), form a matrix, recorded as. and make.
The previous eigenvalues (in this article, unless otherwise specified, "Previous" refers to the order in which the eigenvalues are sized from small to large) and the corresponding eigenvectors.
This feature (column) vector is arranged together to form a matrix, each of which is considered to be a vector in the dimension space and clustered using the K-means algorithm. The category that each row belongs to in the result of the cluster is the category that the node in the original Graph belongs to, namely the original data points.

Just a few steps, the data made some weird transformation, and then secretly called the K-means behind. So far, you can take it to the streets to bluff. However, if you still feel that it is not very reliable, you might as well continue to look down, we will talk about spectral clustering those "strange change" behind the rationale.

In fact, if you are familiar with dimensional Reduction (dimensionality), it is probably already seen, spectral clustering in fact, Laplacian eigenmap through the reduction of dimensions and then do K-means A process that sounds much more earthy. But why should it just fall to the dimensions? In fact, the whole model can also be exported from another angle, so let's go ahead and run the problem.

In Image processing (I seem to have heard of my abhorrence of this field before?) One of the problems is to segmentation the image (area segmentation), that is, to have similar pixels to form an area, for example, we generally want the person (foreground) and background of a photograph to be divided into different regions. In the Image processing field there are many automatic or semi-automated algorithms to solve this problem, and there are many methods and clustering are closely related. For example, when we were talking about Vector quantization, we used K-means to get together colors of similar pixels, but that's not really segmentation, because if we were just thinking about color similarity, Pixels that are far away from the picture may also be clustered into the same class, and we don't usually call something like "free" pixels as an "area", but the problem is also well solved: simply add location information to the feature of the cluster (for example, using R, G, B Three values to represent a pixel, now add x, y two new values).

On the other hand, one of the most frequently asked questions is graph cut, which simply cuts the edges of a graph and lets him break up into independent sub-graph, and the sum of the weights of the severed edges is called the cut value. If you use all the pixels in a picture to form a Graph and connect the nodes that are similar to each other (for example, color and position) and the weights on the edges indicate similarity, then dividing the image into several areas is actually equivalent to dividing graph into several sub-graph, And we can ask the cut-off value of the smallest, that is: those cut off the weight of the edge of the smallest, intuitively we can know that the weight of the larger side is not cut off, indicating that the more similar points are retained in the same sub-graph, and the point of contact between each other is separated. We can think of such a segmentation method is better.

In fact, aside from the problem of image segmentation, the Minimum cut (minimum cut) itself is a widely researched problem in Graph cut-related issues, and has a mature algorithm to solve. Just the smallest cut here is usually not particularly useful, and many times simply split the pixel with the weakest of the other pixels, instead, we usually prefer to split the area (the size) relatively evenly, rather than some large chunks and some almost isolated points. To this end, there are many alternative algorithms proposed, such as Ratio cut, normalized cut and so on.

However, before we proceed, we will define the notation first, because it is difficult to articulate the text alone. Graph is represented as an adjacency matrix, denoted by node-to-node weights, if two nodes are not connected and the weights are zero. Set to two subsets of Graph (no intersection), the cut between the two can be formally defined as:

Consider the simplest case, if a Graph is divided into two parts, then the Minimum cut is to minimize (which represents the complement). However, since this often occurs when isolated nodes are split up, there are ratiocut:

and Normalizedcut:

which represents the number of nodes in, and. Both can be counted as a measure of "size", by placing such items in the denominator, it is possible to effectively prevent the occurrence of outliers, to achieve a relatively average number of divisions. In fact, Jianbo Shi's Pami paper:normalized Cuts and image segmentation used the normalizedcut for image segmentation.

Moving out of Ratiocut and Normalizedcut is because they are actually very close to the spectral clustering here. Look at Ratiocut, although simple, but to minimize it is a NP difficult problem, inconvenient to solve, in order to find a solution, let us first to do deformation.

A collection of all the nodes that represent Graph, first defining a dimension vector:

Recall our first definition of the matrix, in fact, it has a name, called Graph Laplacian, but we can see later, there are actually several similar matrices are called this name:

Usually, every author just calls "his" matrix, the graph Laplacian.

In fact, it can be understood, as if all the manufacturers now say their technology is "cloud computing" as well. This has one property:

This is a set of arbitrary vectors, well proven, as long as the definition of expansion can be obtained. Take the one we just defined and we can get it.

In addition, direct expansion can be easily obtained if the order is a vector of all 1 elements. Because it is a constant, minimizing ratiocut is equivalent to minimizing it, and of course, remember to add additional conditions as well.

The problem turned out to be like this, because there is a thing called Rayleigh quotient:

His maximum and minimum values are equal to the largest eigenvalues and the smallest eigenvalues of the matrix, and the extrema are taken when they are equal to the corresponding eigenvectors. Because it is a constant, minimizing is actually equivalent to minimizing it, but because the smallest eigenvalue is zero, and the corresponding eigenvectors are exactly the same (we only consider Graph as the link here), the conditions are not satisfied, so we take the second small eigenvalues, and the corresponding eigenvectors.

At this point, we seem to have easily solved the NP-hard problem in front of us, in fact, we played a trick: the previous problem is NP difficult because the elements of the vector can only take two values and one, is a discrete problem, and we ask for the eigenvectors where the element can be any real number, that is, I The limitations of the original problem were relaxed. So how do you get the original solution? One of the simplest ways is to see whether each element is greater than 0 or less than 0, and they correspond to discrete cases respectively, but we can also take a slightly more complex approach, using the K-means to combine elements into two categories.

So far, there has been a shadow of spectral clustering: to find eigenvalues, and then to K-means clustering of eigenvectors. In fact, from the two categories of problems to extend to the problem of K (mathematical deduction I will no longer write in detail), we have the same steps as the previous spectral clustering: to find the eigenvalues and take the first k minimum, the corresponding feature vectors are arranged, and then by the row of the K-means cluster. Exactly

In a similar way, normalizedcut can also be equivalent to spectral clustering but this time I will not talk about that much, interesting words (also include some other forms of Graph Laplacian and spectral clustering and the Random walk relationship), you can go to see this tutorial:a Tutorial on spectral clustering.

To ease the mood, I decided to post a simple Matlab implementation of spectral clustering:

function idx = spectral_clustering (W, k)    D = Diag (sum (w));    L = d-w;    opt = struct (' Issym ', true, ' Isreal ', true);    [V Dummy] = Eigs (L, D, K, ' SM ', opt);    idx = Kmeans (V, k); end

Finally, let's take a look at some of the advantages of spectral clustering at the beginning of this article:

requires only the similarity matrix of the data. This is obvious, because all the information required by spectral clustering is included in the . But General is not always equal to the initial similarity matrix--recall that, is the adjacency matrix representation of the graph we have constructed, usually in order to facilitate clustering when constructing graph, to strengthen to "local" connectivity, That is, the main consideration is to connect the similar points together, for example, we set a threshold value, if the similarity of two points is less than this threshold, they are considered to be not connected. Another way to construct Graph is to connect n nodes with the most similar points to them.
catches the main contradiction, ignores the secondary thing, performance is better than the traditional k-means. In fact spectral clustering is using the elements of the eigenvectors to represent the original data and to K-means on this "better representation". In fact, this "better representation" is the result of Eigenmap with Laplacian, if there is a chance to talk about dimensionality Reduction next time will be detailed. The goal of dimensionality reduction is to "seize the principal contradiction and ignore the secondary things".
The computational complexity is smaller than k-means. This is especially true for high-dimensional data. For example, text data, usually arranged in a very high dimension (for example, thousands of or tens of thousands of) sparse matrix, sparse matrix to find eigenvalues and eigenvectors have a very efficient way, the result is some k-dimensional vectors (usually k is not very large), in these low-dimensional data to do k-means computation is very small. But for the raw data directly to do K-means, although the initial data is sparse matrix, but there is a k-means in the centroid operation, is to find a mean: Many sparse vectors are not necessarily sparse vector, in fact, in the text data, In many cases the centroid vector is very dense, when the distance between the calculation vectors, the computation becomes very large, directly leading to the ordinary K-means giant slow, and spectral clustering and other processes more rapid results.

Said so much, seems to be a bit chaotic, but only to this stop. Finally, a word of mouth, spectral clustering name from spectral theory, that is, with the characteristics of decomposition to analyze the theory of the problem.

Update 2011.11.23: There are a lot of classmates asked me about the code problem, here update two main questions:

About the relationship between the dimension of LE dimensionality and the number of categories of Kmeans clusters: The code above is taken as the same, but the two are not required to necessarily be the same. The first spectral Cluster is the analysis of two categories of the situation, down to 1-dimensional, and then use the Thresholding method to split into two categories. In the case of the K class, the natural analogy is to the K-1 dimension, which is also consistent with LDA. Since the eigenvectors of the Laplacian matrix have a vector of all one (corresponding to the eigenvalues 0), it is possible to find the K eigenvectors and then remove the eigenvectors corresponding to the eigenvalues 0. But in practice does not have to keep the two consistent, that is to say, the dimension of the dimension can be adjusted as a parameter, select the effect of good parameters.
About the sample code: note that unless I specify here to publish a software or package, the code here is mostly for example, in order to show only the main parts of the algorithm, so often omit some implementation details, can be seen as "executable pseudo-code", It is not recommended to use the code here directly to run experiments and the like (including other post related code). Unless you want to experiment with the specific implementation and improvement, you can directly find the online ready-made special package, such as spectral clustering can consider this packet.

Random Talk Clustering (4): Spectral clustering

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Random Talk Clustering (4): Spectral clustering

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Random Talk Clustering (4): Spectral clustering

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support