A geometric interpretation of the covariance matrix
http://www.visiondummy.com/2014/04/geometric-interpretation-covariance-matrix/
Translation: http://demo.netfoucs.com/u010182633/article/details/45937051
Introduced
In this article, we provide a visual geometric interpretation of the covariance matrix by exploring the relationship between the linear transformation and the covariance of the resulting data. Most textbooks interpret the shape of the data based on the concept of the covariance matrix. Instead, we take a reverse approach that interprets the concept of the covariance matrix based on the shape of the data.
In the why is the sample variance divided by N-1? , we will discuss the concept of variance and provide a well-known derivation and proof of the estimated sample variance formula. Figure 1, used in this article, shows that the standard deviation (square root of the variance) provides a measure of how much data is propagated in the feature space.
We found that unbiased estimates of sample variance can be obtained from the following formula:
However, variance can only be used to interpret data propagation parallel to the direction of the feature space axis. Consider the two-dimensional feature space shown in Figure 2:
For this data, we can calculate the variance in the x direction and the variance in the Y direction. However, the horizontal propagation and vertical propagation of data cannot explain the apparent diagonal relationship. Figure 2 clearly shows that, on average, if the X value of a data point increases, the Y value will also increase, which produces a positive correlation. This correlation can be captured by extending the concept of variance to the so-called data "covariance":
For 2D data, we get that these values can be represented by a matrix called the covariance matrix:
If x is positively correlated with y, then Y and X are also positively correlated. Other words. Therefore, the covariance matrix is always a symmetric matrix with a variance on the diagonal and a covariance on the non-diagonal. The two-dimensional normal distribution data is fully explained by its mean and 2x2 covariance matrices. Similarly, a 3x3 covariance matrix is used to capture the propagation of three-dimensional data, and a nxn covariance matrix captures the propagation of n-dimensional data.
Figure 3 shows how the overall shape of the data defines the covariance matrix:
Eigenvalue decomposition of covariance matrices
In the next section, we will discuss how the covariance matrix is interpreted as white data into linear operations where we observe the data. However, it is important to have an intuitive understanding of how eigenvectors and eigenvalues uniquely determine the covariance matrix (data shape) before delving into the technical details.
As we saw in Figure 3, the covariance matrix defines the propagation (variance) and direction (covariance) of our data. So, if we want to use a vector and its size to represent the covariance matrix, we should simply try to find the vector that points to the maximum propagation direction of the data, which is equal to the propagation (variance) in this direction.
If we define this vector as, then our data d to the map on this vector is, the variance of the mapped data is. Since we are looking for vectors that point to the direction of the maximum variance, we should choose its composition so that the covariance matrix of the mapped data is as large as possible. Maximizing the form of any function, which is a normalized unit vector, can be expressed in a so-called Rayleigh quotient. The maximum value of the Rayleigh quotient can be obtained by setting the maximum feature vector of the equivalent matrix.
In other words, the maximum eigenvector of a covariance matrix always points to the direction of the maximum variance of the data, and the amplitude of the vector is equal to the corresponding eigenvalue. The second large eigenvector is always orthogonal to the maximum eigenvector and points to the propagation direction of the second big data.
Now, let's take a look at some examples. http://blog.csdn.net/u010182633/article/details/45921929 in the article eigenvalues and eigenvectors, we see that a linear transformation matrix T is completely defined by its eigenvectors and eigenvalues. Applied to the covariance matrix, which means:
If the covariance matrix of our data is a diagonal matrix, so that the covariance is zero, then this means that the variance must be equal to the eigenvalue λ. As shown in 4, the eigenvector is represented by green and magenta, and the eigenvalues are obviously equal to the variance component of the covariance matrix.
However, if the covariance matrix is not diagonal and the covariance is not zero, then the situation is slightly more complicated. The eigenvalues still represent the variance size of the maximum propagation direction of the data, and the variance component of the covariance matrix still represents the size of the variance in the x-axis and y-axis directions. However, because the data is not axis-aligned, these values are no longer the same as shown in Figure 5.
By comparing Figure 5 and Figure 4, it is clear to see that the eigenvalues represent the variance of the data along the eigenvector direction, while the variance component of the covariance matrix represents the propagation along the axis. If there is no covariance, the two values are equal.
Covariance matrix as a linear transformation
Now, let's forget about the covariance matrix. The example of Figure 3 can be considered simply as a linear transformation example of Figure 6:
Figure 6 shows the data is D, then each instance shown in Figure 3 can be obtained by linear transformation D:
where T is the transformation matrix, including a rotation matrix R and a scaling matrix S:
These matrices are defined as follows:
Which is the angle of rotation.
Is the scale factor in the X and y directions, respectively.
In the following paragraphs, we will discuss the relationship between the covariance matrix and the linear transformation matrix t= Rs.
Let's start by never scaling (scaling the equivalent of 1) and not spinning the data. In statistics, this is often "white data" because its sample is derived from a standard normal distribution and therefore corresponds to white (irrelevant) noise:
The covariance matrix of this "white" data equals the unit matrix, making the variance and the standard deviation equal to 1 and the covariance equal to zero:
Now let's scale the data in x direction with factor 4:
Data d ' is now as follows:
The covariance of D ' is now:
The covariance of D ' is related to the linear transformation matrix T, D=TD, wherein:
However, although the equation (12) is set when the data is scaled in the X and Y directions, is the application rotation still true? To investigate the relationship between the linear transformation matrix T and the covariance matrix in general, we try to decompose the covariance matrix into the product of the rotation and scaling matrices.
As we saw earlier, we can represent the covariance matrix with eigenvectors and eigenvalues:
The equation (13) holds each eigenvector and eigenvalue of the matrix σ. In the 2D case, we get two eigenvalues and two eigenvalue values. The two equations defined by the formula (13) can be effectively represented by a matrix symbol:
where V is a matrix whose columns are Σ 's eigenvectors, L is a diagonal matrix, and its non-0 elements correspond to eigenvalues.
This means that we can represent the covariance matrix as a function of eigenvectors and eigenvalues:
Equation (15) is called covariance matrix eigenvalue decomposition, and can be obtained by using singular value decomposition algorithm. The eigenvector represents the direction of the maximum variance of the data, and the eigenvalues represent the amplitude of those directional variances. In other words, V represents a rotation matrix and represents a scaling matrix. The covariance matrix can be further decomposed into:
In equation (6), we define a linear transformation t= RS. Since S is a diagonal scaling matrix, so s=st. In addition, because R is an orthogonal matrix, r-1=rt. Therefore, the covariance matrix can be written as:
In other words, if we apply the linear transformation defined by T=RS to the raw white data shown in Figure 7, we get the data d ' and the covariance matrix for rotation and scaling. This is shown in Figure 10:
The colored arrows in Figure 10 represent the feature vectors. The maximum eigenvector, the eigenvector corresponding to the maximum eigenvalue, always points to the direction of the maximum variance of the data and thus determines its orientation. The secondary eigenvector is always orthogonal to the maximum eigenvector, because of the orthogonality of the rotation matrix.
Summarize
In this paper, we show that the covariance matrix of the data is directly related to the linear transformation of the white unrelated data. This linear transformation is determined entirely by the characteristic vectors and eigenvalues of the data. The eigenvector represents the rotation matrix, and the eigenvalues correspond to the square of the scaling factor on each dimension.
Geometric interpretation of covariance matrices