A geometric interpretation of the covariance matrix

Source: Internet
Author: User

A geometric interpretation of the covariance matrix

Contents [Hide]

    • 1 Introduction
    • 2 eigendecomposition of a covariance matrix
    • 3 covariance matrix as a linear transformation
    • 4 Conclusion
Introduction

In this article, we provide a intuitive, geometric interpretation of the covariance matrix, by exploring the relation bet Ween linear transformations and the resulting data covariance. Most textbooks explain the shape of the data based on the concept of covariance matrices. Instead, we take a backwards approach and explain the concept of covariance matrices on the shape of data.

In a previous article, we discussed the concept of variance, and provided a derivation and proof of the well known formula To estimate the sample variance. Figure 1 is used in this article to show that the standard deviation, as the square root of the variance, provides a MEAs Ure of how much the data is spread across the feature space.

Figure 1. Gaussian density function. For normally distributed data, 68% of the samples fall within the interval defined by the mean plus and minus the standard Deviation.

We showed that a unbiased estimator of the sample variance can be obtained by:

(1)  

However, variance can only is used to explain the spread of the data in the directions parallel to the axes of the feature Space. Consider the 2D feature space shown by Figure 2:

Figure 2. The diagnoal spread of the data is captured by the covariance.

For this data, we could calculate the variance in the x-direction and the variance in the y-direction. However, the horizontal spread and the vertical spread of the data does not explain the clear diagonal correlation. Figure 2 clearly shows in average, if the x-value of a data point increases, then also the y-value increases, Resulti Ng in a positive correlation. This correlation can being captured by extending the notion of variance to what is called the ' covariance ' of the data:

(2)  

For 2D data, we thus obtain, and. These four values can is summarized in a matrix, called the covariance matrix:

(3)  

If x is positively correlated with y, and y is also positively correlated with X. In the other words, we can state that. Therefore, the covariance matrix is always a symmetric matrix with the variances in its diagonal and the covariances off-d Iagonal. Two-dimensional normally distributed data is explained completely by its mean and its covariance matrix. Similarly, a covariance matrix is used to capture the spread of three-dimensional data, and a covariance matrix captures The spread of n-dimensional data.

Figure 3 illustrates how the overall shape of the data defines the covariance matrix:

Figure 3. The covariance matrix defines the shape of the data. Diagonal spread is captured by the covariance while axis-aligned spread are captured by the variance.

eigendecomposition of a covariance matrix

In the next section, we'll discuss how the covariance matrix can be interpreted as a linear operator that transforms WHI TE data into the data we observed. However, before diving into the technical details, it's important to gain a intuitive understanding of how eigenvectors and eigenvalues uniquely define the covariance matrix, and therefore the shape of our data.

As we saw in Figure 3, the covariance matrix defines both the spread (variance), and the orientation (covariance) of our D Ata. So, if we would like to represent the covariance matrix with a vector and its magnitude, we should simply try to find the Vector that points into the direction of the largest spread of the data, and whose magnitude equals the spread (variance) In this direction.

If We define this vector as  and then the projection of our data  onto this vector is obtained as  d The variance of the projected data is . Since We is looking for the vector  that points into the direction of the largest variance, we should choose it s components such, the covariance matrix  of the projected data is as large as possible. Maximizing any function of the form  with respect to , where  is a normalized unit vector, can be Formulated as a so Called rayleigh quotient. The maximum of such a Rayleigh quotient is obtained by Setting  equal to the largest eigenvector of matrix  .

In and words, the largest eigenvector of the covariance matrix always points into the direction of the largest variance Of the data, and the magnitude of this vector equals the corresponding eigenvalue. The second largest eigenvector is all orthogonal to the largest eigenvector, and points into the direction of the Secon D largest spread of the data.

Now let's has a look at some examples. In the earlier article we saw that a linear transformation matrix are completely defined by itseigenvectors and eigenvalues . Applied to the covariance matrix, this means:

(4)  

Where is a eigenvector of, and is the corresponding eigenvalue.

If the covariance matrix of our data is a diagonal matrix, such that the covariances be zero, then this means that the VA Riances must is equal to the eigenvalues. This was illustrated by Figure 4, where the eigenvectors was shown in green and magenta, and where the eigenvalues clearly Equal the variance components of the covariance matrix.

Figure 4. Eigenvectors of a covariance matrix

However, if the covariance matrix isn't diagonal, such that the covariances was not zero and then the situation is a little More complicated. The eigenvalues still represent the variance magnitude in the direction of the largest spread of the data, and the Varianc E The covariance matrix still represent the variance magnitude in the direction of the x-axis and y-axis. But since the data was not axis aligned, these values was not the same anymore as shown by Figure 5.

Figure 5. Eigenvalues versus Variance

By comparing Figure 5 with Figure 4, it becomes clear that the eigenvalues represent the variance of the data along the EI Genvector directions, whereas the variance components of the covariance matrix represent the spread along. If There is no covariances, then both values is equal.

covariance matrix as a linear transformation

Now let's forget about covariance matrices for a moment. Each of the examples in Figure 3 can simply is considered to is a linearly transformed instance of Figure 6:

Figure 6. Data with unit covariance matrix are called white data.

Let the data shown by Figure 6 is, then each of the examples shown by Figure 3 can is obtained by linearly transforming:

(5)  

Where is a transformation matrix consisting of a rotation matrix and a scaling matrix:

(6)  

These matrices is defined as:

(7)  

Where is the rotation angle, and:

(8)  

Where and is the scaling factors in the x direction and the y direction respectively.

In the following paragraphs, we'll discuss the relation between the covariance matrix, and the linear transformation MA Trix.

Let's start with unscaled (scale equals 1) and unrotated data. In statistics the often refered to as "white data" because its samples be drawn from a standard normal distribution a nd therefore correspond to white (uncorrelated) Noise:

Figure 7. White data are data with a unit covariance matrix.

The covariance matrix of the "white" data equals the identity matrix, such that the variances and standard deviations equ Al 1 and the covariance equals zero:

(9)  

Now let's scale the data in the x-direction with a factor 4:

(10)  

The data now looks as follows:

Figure 8. Variance in the x-direction results in a horizontal scaling.

The covariance matrix of are now:

(11)  

Thus, the covariance matrix of the resulting data is related to the linear transformation that's applied to the origin Al data as follows:, where

(12)  

However, although equation () holds when the data are scaled in the X and Y direction, the question rises if it also hold s when a rotation is applied. To investigate the relation between the linear transformation matrix and the covariance matrix in the general case, we w Ill therefore try to decompose the covariance matrix into the product of rotation and scaling matrices.

As we saw earlier, we can represent the covariance matrix by its eigenvectors and eigenvalues:

(13)  

Where is a eigenvector of, and is the corresponding eigenvalue.

Equation (holds) for each eigenvector-eigenvalue pair of matrix. In the 2D case, we obtain the eigenvectors and the eigenvalues. The system of the equations defined by equation (in) can be represented efficiently using matrix notation:

(14)  

Where is the matrix whose columns was the eigenvectors of and is the diagonal matrix whose non-zero elements are the CO Rresponding eigenvalues.

This means, we can represent the covariance matrix as a function of its eigenvectors and eigenvalues:

(15)  

Equation () is called the eigendecomposition of the covariance matrix and can be obtained using a Singular Value decompo Sitionalgorithm. Whereas the eigenvectors represent the directions of the largest variance of the data, the eigenvalues represent the Magni Tude of this variance in those directions. In the other words, represents a rotation matrix, while represents a scaling matrix. The covariance matrix can thus be decomposed further as:

(16)  

Where is a rotation matrix and is a scaling matrix.

In equation (6) We defined a linear transformation. Since is a diagonal scaling matrix. Furthermore, since is an orthogonal matrix,. Therefore,. The covariance matrix can thus be written as:

(17)  

In other words, if we apply the linear transformation defined by to the original white data shown by Figure 7, we obtain The rotated and scaled data with covariance matrix. This was illustrated by Figure 10:

Figure Ten. The covariance matrix represents a linear transformation of the original data.

The colored arrows in Figure ten represent the eigenvectors. The largest eigenvector, i.e. the eigenvector with the largest corresponding eigenvalue, always points in the direction of The largest variance of the data and thereby defines its orientation. Subsequent eigenvectors is always orthogonal to the largest eigenvector due to the orthogonality of rotation matrices.

Conclusion

In this article we showed the covariance matrix of observed data are directly related to a linear transformation of WH ITE, uncorrelated data. This linear transformation are completely defined by the eigenvectors and eigenvalues of the data. While the eigenvectors represent the rotation matrix, the eigenvalues correspond to the square of the scaling factor in EA Ch Dimension.

If you ' re new to the This blog, don ' t forget to subscribe, or follow me on twitter!

A geometric interpretation of the covariance matrix

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.