Analysis of Covariance Matrix

Source: Internet
Author: User
Introduction

In pattern recognition, we often encounter the concept of covariance matrix, which has never been understood before. Now we have encountered this broken thing and decided to understand it. Below are the notes we will take and share them with you.

Mean Value:650) This. width = 650; "src =" http://s3.51cto.com/wyfs02/M02/4A/B3/wKiom1QmtIOwkXTCAAAROok9TGU151.jpg "style =" float: none; "Title =" image1.png "alt =" wkiom1qmtiowkxtcaw.ook9tgu151.jpg "/>

Standard Deviation:650) This. width = 650; "src =" http://s3.51cto.com/wyfs02/M00/4A/B5/wKioL1QmtKqC5nkmAAAeWc5tHQ0929.jpg "style =" float: none; "Title =" image2.png "alt =" wkiol1qmtkqc5nkmaaaewc5thq0929.jpg "/>

Variance:650) This. width = 650; "src =" http://s3.51cto.com/wyfs02/M00/4A/B5/wKioL1QmtKuzKYGGAAAZ9xGQ38E204.jpg "style =" float: none; "Title =" image3.png "alt =" wkiol1qmtkuzkyggaaaz9xgq38e204.jpg "/>

The mean value describes the average value of the sample in the sample set.,Mean level.

The standard deviation describes the average distance from each sample point to the mean of the sample set.,Reflects the degree of deviation from the average value. It reflects the data stability.

650) This. width = 650; "src =" http://s3.51cto.com/wyfs02/M00/4A/B3/wKiom1QmtIOh8RSnAADYrAEp4v0482.jpg "style =" float: none; "Title =" image4.png "alt =" wkiom1qmtioh8rsnaadyraep4v0482.jpg "/>

The reason for dividing n-1 rather than N is that we can better approximate the population standard deviation with a smaller sample set, that is, the so-called "unbiased estimation" in statistics ".

For example, to check the lifetime of a bulb, we need to check the mean and variance of two bulb. The mean value can only represent the average lifetime, but it cannot reflect the stability of the bulb, stability must be measured by variance.

 

 

Here, X is only a one-dimensional feature. We can think of X as the age feature in the person (name, age, height) feature vector, X does not represent the person sample. If it represents a person, what is the average value of age and height?

Why is covariance required?

Mean and variance are generally used to describe one-dimensional data. However, in real life, we often encounter data sets with multidimensional data.

In multi-dimensional situations, the processing needsE (X)AndD (x)In addition, the relationship between dimensions needs to be discussed. covariance is used to measure two random variables.(Features)Correlation statistics(When selecting features, we need to select features with low correlation, such as the Nb algorithm). variance is a special case of covariance, that is, when two variables are the same. The covariance between two real number random variables X and Y whose expected values are E [X] and e [y] is defined:


650) This. width = 650; "src =" http://s3.51cto.com/wyfs02/M01/4A/B5/wKioL1QmtKvw9DcFAAArBhVE6Co051.jpg "style =" float: none; "Title =" image6.png "alt =" wkiol1qmtkvw9dcfaaarbhve6co051.jpg "/>

650) This. width = 650; "src =" http://s3.51cto.com/wyfs02/M01/4A/B3/wKiom1QmtIOCMD5wAAAn3v4-A_A351.jpg "style =" float: none; "Title =" image7.png "alt =" wKiom1QmtIOCMD5wAAAn3v4-A_A351.jpg "/>

Covariance can only deal with two-dimensional problems. When there are more dimensions, the covariance matrix should be used:

Covariance Matrix

Definition:


650) This. width = 650; "src =" http://s3.51cto.com/wyfs02/M01/4A/B5/wKioL1QmtbHhzoamAAAg3PMESYo491.jpg "Title =" image8.png "alt =" wkiol1qmtbhhzoamaaag3pmesyo491.jpg "/>

Assume that the dataset has

650) This. width = 650; "src =" http://s3.51cto.com/wyfs02/M00/4A/B3/wKiom1QmtIXAouSoAAALm0cp6dE841.jpg "style =" float: none; "Title =" image9.png "alt =" wkiom1qmtixaousoaaalm0cp6de841.jpg "/>, the covariance matrix is

C = Cov (x, x), Cov (x, y), Cov (x, z)
Cov (Y, x), Cov (Y, Y), Cov (Y, Z)

Cov (z, x), Cov (z, Y), Cov (z, Z)

Visible,The covariance matrix is a symmetric matrix, and the diagonal is the variance of each dimension.

 

The covariance matrix calculates the covariance between different dimensions, rather than between different samples.

MATLAB covariance example

A 10*3-dimensional integer matrix is randomly generated as a sample set. 10 represents the number of samples, and 3 represents the sample dimension.
Sample = fix (RAND (10, 3) * 50)
Result:



650) This. width = 650; "src =" http://s3.51cto.com/wyfs02/M02/4A/B5/wKioL1QmtK3iJpT2AABoPqNd_zk111.jpg "style =" float: none; "Title =" image11.png "alt =" wkiol1qmtk3ijpt2aabopqnd_zk111.jpg "/>

We have also stressed that,The covariance matrix is used to calculate the covariance between different dimensions (different features ).Always keep this in mind.Each row of the sample matrix is a sample, and each column is a dimension. Therefore, we need to calculate the mean value by column.. For ease of description, we first assign values to the data of the three dimensions:

650) This. width = 650; "src =" http://s3.51cto.com/wyfs02/M02/4A/B3/wKiom1QmtIbjUOt6AABBp53mD5k385.jpg "style =" float: none; "Title =" image12.png "alt =" wkiom1qmtibjuot6aabbp53md5k385.jpg "/>

  1. 1. Calculate the covariance between the two three:
    Covariance between dim1 and dim2, dim1 and dim3, dim2 and dim3:
    ● Sum (dim1-mean (dim1). * (dim2-mean (dim2)/(SIZE (sample, 1)-1) %
    78
    ● Sum (dim1-mean (dim1). * (dim3-mean (dim3)/(SIZE (sample, 1)-1) %-120.2444
    ● Sum (dim2-mean (dim2). * (dim3-mean (dim3)/(SIZE (sample, 1)-1) %-126.9444

  2. 2. Calculate the variance of each corner of the diagonal line
    STD (dim1) ^ 2% get: 301.1556
    STD (dim2) ^ 2% get: 268.9444
    STD (dim3) ^ 2% get: 216.0111
    In this way, we obtain all the data required to calculate the covariance matrix.

  3. 3. Call the Matlab-provided cov function for verification:

Cov (sample)

Result:

650) This. width = 650; "src =" http://s3.51cto.com/wyfs02/M01/4A/B5/wKioL1QmtK2yqI6-AABsXyajRmA590.jpg "style =" float: none; "Title =" image13.png "alt =" wKioL1QmtK2yqI6-AABsXyajRmA590.jpg "/>

Keep following the calculation result, which indicates that the calculation result is correct.

 

Finally, understandCovariance Matrix
The key is to remember that it calculates the covariance between different dimensions, rather than between different samples. To obtain a sample matrix, we must first determine whether a row is a sample or a dimension.

★Meaning: Relationship between Random Variables



This article from "QQ" blog, please be sure to keep this source http://qianqing13579.blog.51cto.com/5255432/1558863

Analysis of Covariance Matrix

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.