Speaking of spectral clustering and Laplace matrix clustering from the Laplace Matrix

Source: Internet
Author: User

Speaking of spectral clustering and Laplace matrix clustering from the Laplace Matrix

Speaking of spectral clustering from the Laplace Matrix


0 Introduction

On the morning of October November 1, Zou Bo spoke about clustering (PPT) in the 7th class of the Machine Learning class. The spectral clustering aroused his interest. He started from the basic concepts: the unit vector, the orthogonal of the two vectors, the feature value of the square matrix, and the feature vector are described in the similarity graph and Laplace matrix. Finally, the target function of spectral clustering and its algorithm flow are discussed.

After class, I learned how to write a blog to record my learning experience by means of spectral clustering and Laplace matrix. If you have any shortcomings or suggestions, please feel free to point out thanks.


1. Matrix Basics

Before talking about spectral clustering, it is necessary to understand basic matrix knowledge.

1.0 12-point mathematical notes for understanding the Matrix

If the concept of the matrix has been vague, we recommend that you use the "Understanding matrix by Meng Yan" series written by one person in China, which throws a lot of interesting ideas, I made some notes during my previous reading, as shown below:

"1. In short: matrices are the descriptions of transformations in linear spaces, while similarity matrices are different descriptions of the same linear transformation. So what is space? In essence, "space is a collection of objects that hold motion, while transformation defines the motion of the corresponding space" by Meng Yan. After a base is selected in a linear space, the vector depicts the motion of an object, and the motion is multiplied by a matrix and a vector. However, what is the foundation? Coordinate system.

2. With the foundation, what is said in (1) should be: the matrix is the description of the transformation in the linear space, similarity matrices are different descriptions of the same linear transformation under different bases (Coordinate Systems. There are two problems: What is the transformation of the first party? How do they understand the differences between the two (Coordinate System? In fact, the so-called transformation refers to the transition from one point (element/object) to another (Element Object) in space. A matrix is used to describe linear transformation. What about base? As we have known before, a matrix is nothing more than a thing used to describe linear transformation in a linear space. A linear transformation is a noun, and a matrix is an adjective describing it, just as to describe a person who looks good, he can describe it with multiple different adjectives "handsome" and "pretty". The same linear transformation can also be described by multiple different matrices, which matrix describes it is determined by the Base (Coordinate System.
3. The basis, coordinate system, and image representation are different points of view. However, the conclusion does not represent the problem itself, for a linear transformation, you can select a group of bases, get a matrix to describe it, change a group of bases, and get different matrices to describe it. The matrix only describes the linear transformation nonlinear transformation itself, the analogy is to select different angles for a person to take a photo.

4. Previously, we used matrices to describe linear transformations. However, matrices can be used not only to describe linear transformations, but also to describe the basis (Coordinate System/angle). The former is easy to understand, the transformation matrix is used to transform a point in a linear space to another point, but you say that a matrix is used to describe the base (transform a coordinate system to another coordinate system ), what does this mean? In fact, the transformation point is the same as the transformation coordinate system!
(@ Kanerjing collar: The matrix can also be used to describe the differentiation and integral transformation. The key is what the base represents. Using the coordinate base is a coordinate transformation. If the basis is wavelet or Fourier basis, it can be used to describe wavelet transform or Fourier Transform)

5. The matrix is the description of linear motion (transformation), while the multiplication of matrix and vector is the process of implementing motion (transformation). The same transformation is represented as different matrices in different coordinate systems, but the essence/feature values are the same, and the motion is relative. The object transformation is equivalent to the coordinate system transformation. For example, if the point () is changed to (2, 3), one can let the coordinate point move, both of them can change the length of the X axis unit measurement to the original 1/2, and the length of the Y axis unit measurement to the original 1/3, both of which can achieve the goal.

6. Ma = B. When coordinate point is moved, vector a is transformed by M as described in matrix M and changed to vector B. In the variable coordinate system, there is a vector, it returns a in the M coordinate system, and B in the measurement of coordinate system I (I is the matrix of units, the main diagonal is 1, and other is 0, essentially, point motion is equivalent to the transformed coordinate system. Why? As described in (5), the same transformation represents different matrices in different coordinate systems, but is essentially the same.

7. In (6), Ib and I are referred to as the Unit coordinate system, which is actually the Cartesian coordinate system we often call. For example, Ma = Ib. In the M coordinate system, a is the vector, in the I coordinate system, vector B is essentially the same vector. Therefore, matrix multiplication is equivalent to identity recognition. What is a vector? Place the measurement in the coordinate system, and then arrange the measurement results (projection values of the vectors on each coordinate axis) in order to form a vector.

8. B is Ib in the I coordinate system, and a is Ma In the M coordinate system. Therefore, MxN is a matrix multiplication, but N is measured in the M coordinate system to obtain MN, M itself is measured in the I coordinate system. So Ma = Ib, a in the M coordinate system is transferred to a number in the I coordinate system, but it becomes B. For example, a vector (x, y) is a number in the Cartesian coordinate system where the unit length is 1, which is (), and the unit length on the x axis is 2. the Y axis is measured in 3 units ).

9. What is an inverse matrix? Ma = Ib. Previously, it was clear that coordinate system transformation M-> I is equivalent to coordinate system transformation a-> B. But how does M change to I? Let's multiply M by M's inverse matrix. In Coordinate System
For example, the length of the X axis measurement unit changes to 1/2, and the length of the Y axis measurement unit changes to 1/3, that is
Multiply To form the Cartesian coordinate system I. That is, apply a transformation to the coordinate system to multiply the transformation matrix. "1.1 a bunch of basic concepts according to wikipedia's introduction, in the matrix, the n Level Unit MatrixIs a square matrix. Its main diagonal element is 1, and the remaining elements are 0. The unit matrix is expressed as follows. If the order can be ignored or determined by the preceding and following texts, it can also be abbreviated as (or E ). As shown in, there are some unit matrices:

The column in the unit matrix is the unit vector. The unit vector is also the feature vector of the unit matrix. The feature values are all 1. Therefore, this is the unique feature value and has n. It can be seen that the determinant of the unit matrix is 1 and the number of traces is n. Unit VectorWhat is it? In mathematics, the unit vector in the fan vector space is the vector with a length of 1. In Euclidean space, the point product of two unit vectors is the cosine of the angle between them (because their lengths are both 1 ). A normalized vector (unit vector) of a non-zero vector is a parallel unit vector, which is recorded:


Here is the norm (length ). What is Point product? Dot product is also called inner product. Two vectors are equal to [a1, a2 ,..., An] And = [b1, b2 ,..., The dot product of bn] is defined:

Here, Σ indicates the sum symbol. For example, the point product of two 3D vectors [1, 3,-5] and [4,-2,-1] is:

Use matrix multiplication and treat the (column) vector as the n × 1 matrix. The dot product can also be written:

The transpose of the indicator matrix here. In the preceding example, multiply A 1 × 3 matrix (that is, a row vector) by a 3 × 1 vector to obtain the result (the advantage of matrix multiplication is that a 1 × 1 matrix is a scalar ):

In addition to the above Algebraic Definition, Point product has another definition: geometric definition. In Euclidean space, point products can be intuitively defined:

Here | represents the modulus (length), θ represents the angle between two vectors. According to this definition, the point product of two vertical vectors is always zero. If the sum is a unit vector (with a length of 1), their dot product is the cosine of their angle.

OrthogonalVertical is the promotion of this concept. If the Inner Product (Point product) of the two vectors in the inner product space is 0, they are called orthogonal, which is equivalent to the vertical of the two vectors. In other words, if the angle between vectors can be defined, orthogonal design can be intuitively understood as vertical. WhileOrthogonal matrix(Orthogonal matrix) is a square matrix (square matrix, or square matrix for short) of unit vectors with elements as real numbers and orthogonal rows and columns, is the matrix with the same number of rows and number of columns .)

If the number and the non-zero vector are satisfied Feature vectorsIs its corresponding Feature value. In other words, what we do in this direction is simply to lengthen/shorten it a little (rather than a regular multi-dimensional transformation ), it indicates the proportion of the stretch in this direction. To put it simply, we have made some effort to make the vector become longer or shorter, but its own direction remains unchanged. Matrix TraceIs the sum of diagonal elements of a matrix, and also the sum of its feature values. For more matrix-related concepts, see wikipedia or matrix analysis and application.

2. Laplace matrix 2.1 Laplacian matrix defines the Laplace matrix (Laplacian matrix), also known as the kikhof matrix, which represents a matrix of graphs. Given a graph with n vertices, its Laplace matrix is defined:

Where D is the degree matrix of the graph, and W is the adjacent matrix of the graph. For example. A simple graph is given as follows:
Convert the "Graph" to the form of the adjacent matrix, as follows:

Add the elements in each column to obtain the number of elements, and place them on the diagonal (zero in other places) to form a diagonal matrix, which is recorded as a degree matrix, as shown in:

According to the definition of the Laplace matrix, the obtained Laplace matrix is:

2.2 The properties of the Laplace matrix are as follows:
  •  Is a symmetric semi-definite matrix;
  •  That is, the minimum feature value is 0, and the corresponding feature vector is (don't forget, the previous feature value and feature vector definition: if the number and the non-zero vector meet, it isFeature vectorsIs its correspondingFeature value).
  • There are n non-negative real feature values
  • For f, the following formula is available:
Where ,,,. Later, there will be information about DIAnd W(A, B). Below, we will prove the above conclusions as follows:

For more information about the nature of the Laplace matrix, see document 6: "A Tutorial on Spectral Clustering ".

The intuitive interpretation of spectral clustering is to divide samples into different groups based on their similarity. This is equivalent to converting a clustering problem into a graph segmentation problem if we regard samples as vertices and similarity between samples as weighted edges, the idea of spectral clustering is to find a graph segmentation method, so that the edge weights connected to different groups are as low as possible (this means that the similarity between groups should be as low as possible ), the edge weight in the group is as high as possible (this means that the group similarity is as high as possible ). In short, spectral clustering is to find a graph division and form several groups so that different groups have lower weights and higher weights, to achieve common clustering purposes. 3.1 related definitions in order to better convert the spectral clustering problem into graph theory problems, the following concepts are defined:
  • Undirected graph G = (V, E)
  • Degree d of the vertex to form a degree matrix D (diagonal matrix)
  • The sum of the weights of all edges in group A and group B is defined as follows:
    The value is defined as the weight from the node to the node. If the two nodes are not connected, the weight is zero.
  • The definition of the similarity matrix. As we have said before, we need to minimize the cut weights because the smaller the weighted sum of the cut edge, the smaller their similarity, and the farther they are separated. Weight matrix. The default value is the similarity matrix. Therefore, it is equivalent to cutting out the edges with small similarity, that is, cutting the image from a similar place. The similarity matrix is equivalent to the similarity obtained by performing a transformation on the distance.Gaussian Kernel Function(Also called radial basis function kernel) calculates the similarity. The larger the distance, the smaller the similarity.
  • The indicator vector of subgraph A is as follows:

3.2 therefore, how to cut a graph becomes the key to the problem. In other words, how can we obtain the optimal result through cutting? For example, if all the pixels in an image are used to form a graph, and nodes (such as colors and positions) are connected, the weights on the edge indicate similarity, now we need to divide the image into several areas (or groups), which requires that the Cut value obtained by the split is the smallest, which is equivalent to the sum of the weights of the Cut edge, the edge with a relatively large weight is not cut off. Only in this way can similar points be retained in the same subgraph, and points with little mutual connection are separated. To make the split CutThe value is the smallest, and the spectral clustering is to minimize the following objective functions:

K indicates dividing into k groups, Ai indicates the I-th group, indicating the completion set of Ai, W (A, B) indicates the sum of the weights of all edges between group A and group B (in other words, if you want to divide the edges into K groups, the cost is the sum of the edge weights removed during splitting ).

To minimize the sum of the cut edge weights, we need to minimize the above objective functions. But in many cases, minimizeCutThis usually leads to poor segmentation. Take the following two types as an example. In this formula, a graph is usually divided into one vertex and the other n-1 vertices. As shown in, it is obvious that the minimum smallest cut is not the best cut, but {A, B, C, H} is divided into one side, {D, E, F, G} is divided into one side, which is likely to be the bestCut:

In order to make each class have a reasonable size, the target function should try to make A1, a2... Ak large enough. The improved target function is:

| A | indicates the number of vertices contained in group.

Or:

Here ,.

Next, let's focus on it.RatioCutFunction.

Target function:
Defines the vector and:

According to the properties of the previously obtained Laplace matrix

Now we can draw a very interesting conclusion when we put the definition formula into the above formula! The derivation process is as follows:

Yes, we have launchedRatioCutIn other words, the Laplace MatrixLAnd the target functions we want to optimizeRatioCutThere are close contacts. Further, minimizeRatioCut, Which is equivalent to minimization.

At the same time, there are:

The minimization of RatioCut is equivalent to minimization, and we can see from the properties of the previously obtained Laplace matrix that the smallest feature value is zero and the corresponding feature vector is exactly, therefore, our target functions can be written as follows:

Here ,.

So far, although it is very difficult to solve the discretization problem, here the RatioCut is cleverly converted into the Laplace matrix feature value (vector) problem, the discrete clustering problem is relaxed into a continuous feature vector, the smallest series of feature vectors correspond to the optimal series Partitioning Method in the graph. The rest is only to discretization the relaxed problem, that is, dividing the feature vectors to get the corresponding category. You can't say no!

3.3 spectral clustering algorithm process

In summary, the algorithm process of spectral clustering is as follows:

As you may have seen, the basic idea of spectral clustering is to use the similarity Matrix (Laplace matrix) between sample data for feature decomposition (using the Dimensionality Reduction Method of Laplacian Eigenmap ), the obtained feature vectors are K-means clustering.


4. references and recommendations
How to use a UCI dataset? The question I want to do is based on the normalized spectral clustering algorithm. What is the applicability of the normalized Laplace matrix?

Use it directly. You can use NotePad to open it. It can be copied and pasted out for use. It is usually a row of data, and some have class labels.
 
What is the Laplace matrix?

In the course of studying the celestial problem, Laplace has created and developed many mathematical methods, with his name for the Laplace transformation, Laplace theorem and Laplace equation, it is widely used in various fields of science and technology.

Laplace, a French mathematician and an astronomy Member of the French Emy of sciences. He is the principal founder of celestial mechanics and one of the creators of celestial evolutionary chemistry. He is also the founder of analytical probability theory, so he can be said to be the pioneer in applied mathematics.

In 1773, a well-known problem was solved: explaining why Jupiter's orbit is shrinking, while Saturn's orbit is expanding. Laplace uses Mathematical Methods to prove the immutability of the mean motion of the planet, that is, the orbital size of the planet only changes cyclically, and proves to be the three power of eccentric heart rate and inclination. This is the famous Laplace theorem.
1784 ~ In 1785, he obtained that the gravity component of any particle outside the celestial body could be expressed by a potential function, which satisfies a partial differential equation, the famous Laplace equation.
In 1786, it was proved that the eccentric heart rate and inclination of the planetary orbit remain small and constant and can be automatically adjusted. That is, the disturbance effect is conservation and cyclical, and will not accumulate or dissolve. Laplace noticed the average motion of the three major satellites of Jupiter Z1, Z2, Z3 obey the following relationship: Z1-3 x Z2 + 2 x Z3 = 0. Similarly, the average motion of the four satellites of Saturn Y1, Y2, Y3, and Y4 have a similar relationship: 5 x Y1-10 x Y2 + Y3 + 4 x Y4 = 0. The latter said that these satellites were available between them, and thus evolved the concept of a window of time.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.