Primary knowledge of PCA data dimensionality reduction

Source: Internet
Author: User

What PCA needs to do is to de-noising and de-redundancy, the essence of which is the diagonalization covariance matrix.

I. Pre-knowledge1.1 Covariance analysis

For the general distribution, the direct generation of E (X) and the like can be calculated, but really give you a specific numerical distribution, to calculate the covariance matrix, according to the formula to calculate, it is not easy to react. There is not much information on the Internet, here is an example of how the covariance matrix is calculated.

Use MATLAB to calculate this example

Z=[1,2;3,6;4,2;5,2]cov (z) ans =    2.9167   -0.3333   -0.3333    4.0000

As you can see, MATLAB also shrinks the elements by 3 times times in the process of calculating covariance. So, the MATLAB formula for covariance is:
Covariance (I,J) = (all elements of column I-column I mean) * (all elements of column J-column J mean)/(Samples-1)
The following is given an example of a 4-D 3 sample, note that 4-dimensional sample and symbol X, Y is not related to, X, Y, two-dimensional, 4-dimensional directly apply the formula, without X, y so have a confusing expression.

Note: When covariance is positive, x and Y are positive correlations, negative covariance, X and Y are negative correlations, and covariance is 0 o'clock X and y are independent of each other.

Cov (x,x) is the variance of X (Variance), Cov (x, y) is the covariance, from the above can be seen, covariance matrix is a square, dimension is the dimension of the sample.

1.2 Covariance Implementation
For I=1:size (a,2) for     j=1:size (a,2) C (i,j) =sum ((A (:, i)-mean (A (:, i))). * (A (:, j)-mean (A (:, j)))         /(Size ()- 1);    End End

Verify the following:

>> a =[    -1     1     2    -2     3     1     4     0     3]a =    -1     1     2    -2     3     1     4     0     3>> CoV (a) ans =   10.3333   -4.1667    3.0000   -4.1667    2.3333   -1.5000    3.0000   -1.5000    1.0000>> for I=1:size (a,2) for     j=1:size (a,2)         C ( I,J) =sum ((A (:, i)-mean (A (:, i))). * (A (:, j)-mean (A (:, j)))/(Size (a,1)-1);    End end>> cc =   10.3333   -4.1667    3.0000   -4.1667    2.3333   -1.5000    3.0000   -1.5000    1.0000
1.3 diagonalization of matrices

The n maximum eigenvalue of the covariance matrix is obtained, then the x* corresponding eigenvector is reduced to n.

1.4 Eig Function

E=eig (a): All eigenvalues of Matrix A are evaluated, and the vector e is formed.
[V,d]=eig (a): To find all the eigenvalues of matrix A, to form a diagonal array D, and the eigenvector of a to form the column vector of V.
[V,d]=eig (A, ' nobalance '): Similar to the 2nd format, but the 2nd is a similar transformation of a to find the eigenvalues and eigenvectors of matrix A, and format 3 directly to the matrix a eigenvalues and eigenvectors.
E=eig (A, B): A Eig (A, A, a) returns the N generalized eigenvalues of the nxn square matrix A and a, constituting the vector e.
[V,d]=eig (b): by Eig (A, A, a, b) returns the n generalized eigenvalues of a square and a matrix, constituting a nxn-order diagonal Array D, the n elements on its diagonal are the corresponding generalized eigenvalues, and the corresponding eigenvectors are returned to form the NXN rank-full rank matrix, and satisfy the AV=BVD.

>> a = [1 2 3; 4 5 6; 7 8 9]a =     1     2     3     4     5     6     7     8     9>> [B,c] = Eig (a) b =
    
     -0.2320   -0.7858    0.4082   -0.5253   -0.0868   -0.8165   -0.8187    0.6123    0.4082c =   16.1168         0         0         0   -1.1168         0         0         0   -0.0000
    

Experimental verification: The normalized covariance matrix of x is the correlation coefficient matrix of x.

In the program is the column to find the difference, which is the same as the PCA covariance.

>> x = a;>> [P,n]=size (x); for J=1:n    Mju (j) =mean (X (:, j));    Sigma (j) =sqrt (CoV (X (:, j))) ENDfor i=1:p for    j=1:n        Y (i,j) = (x (i,j)-mju (j))/sigma (j);    Endendsigmay=cov (Y);% to find the characteristic roots and eigenvectors of the covariance matrix of X-normalized [T,lambda]=eig (sigmay);d ISP (' feature root (from small to Large): ');d ISP (lambda);d ISP (' eigenvectors: '); Disp (T); characteristic root (from small to Large):   -0.0000         0         0         0         0         0         0         0    3.0000 eigenvectors:    0.4082    0.7071    0.5774    0.4082   -0.7071    0.5774   -0.8165         0    0.5774>> r= CORRCOEF (x);% to find the characteristic root and eigenvector of the X correlation coefficient matrix [Tr,lambdar]=eig (R);d ISP (' feature root (from small to Large): ');d ISP (Lambdar);d ISP (' eigenvector: ');d ISP (TR); feature Root ( From small to Large):   -0.0000         0         0         0         0         0         0         0    3.0000 eigenvectors:    0.4082    0.7071    0.5774    0.4082   -0.7071    0.5774   -0.8165         0    0.5774

Contribution rate and cumulative contribution rate.

% Variance contribution rate, cumulative variance contribution rate xsum=sum (sum (lambda,2), 1), for I=1:n    FAI (i) =lambda (i,i)/xsum;endfor i=1:n    Psai (i) = SUM (SUM ( Lambda (1:i,1:i), 2), 1)/xsum;enddisp (' Variance contribution rate: ');d ISP (FAI);d ISP (' cumulative variance contribution rate: ');d ISP (PSAI);% comprehensive evaluation .... Slightly

Doubts: The first eig found that the characteristics of the large-to-small sort, the second discovery is from small to large sort, did not think oh interference, what happened?

1.5 std function

STD (x) calculates the standard deviation of x. X can be a single line of matrix or a multi-line matrix, If there is only one row, then it is the standard deviation of a row, if there are multiple rows, the standard deviation of each column is counted. STD (x,a) is also the standard deviation of x but a can be =0 or 1. If 0 is not the same as before, if 1 is the last divided by N, not the n-1. (you refer to the formula for calculating the standard deviation, which is generally divided by the formula of N-1.) STD (x, a, B) here indicates whether to use N or n-1, if a is 0 is divided by n-1, if 1 is divided by N.

b Here is the dimension, for example

1 2 3 4
4 5 6 1
  If B is 1, that is, according to the line, if B is 2 is according to the column, if it is a three-dimensional matrix, b=3 in accordance with the third Bellavita sub-data.

1.6 Zscore Function

z-score Standardized method applies To property A's New data = (original data-mean)/standard deviation, using the Zscore function, x is the data before normalization, Y is standard data

  Characteristics:
(1) The sample mean is 0, the variance is 1;
(2) The interval is uncertain, and the maximum and minimum values of each index are different after processing;
(3) for the constant value of the indicator is not applicable;
(4) The evaluation method (such as the geometric weighted average method) that requires standardized data greater than 0 is not applicable.

two. PCA parsing

The following is the Baidu Encyclopedia in the PCA to reduce the dimension of an explanation, or quite clear:
For a training set, 100 object templates, characterized by 10 dimensions, it is possible to create a 100*10 matrix as a sample. In order to find the covariance matrix of this sample, we get a 10*10 covariance matrix, then we can find the eigenvalues and eigenvectors of this covariance matrix, we should have 10 eigenvalues and eigenvectors, we take the characteristic vectors corresponding to the first four eigenvalues according to the size of eigenvalues, and form a 10*4 matrix. This matrix is the characteristic matrix that we require, the sample matrix of 100*10 is multiplied by this 10*4 's characteristic matrix, we get a new sample matrix of 100*4 after descending dimension, and the dimensionality of each feature drops.

2.1 Basic Analysis

When multiple observations are made on the same individual, it is bound to involve multiple random variable x1,x2,...,xp, which are correlated and difficult to synthesize. In this case, we need to use principal component Analysis (principal component) to summarize the main aspects of a lot of information. We would like to have one or several better integrated indicators to summarize the information, and we hope that the integrated indicators are independent of each other to represent a particular aspect of the nature.
In addition to the reliability and authenticity of any metric, it is necessary to fully reflect the variation among individuals. If there is an indicator that the values of different individuals are similar, then the indicator cannot be used to distinguish between different individuals. From this point of view, the greater the variation of an indicator between individuals, the better. Therefore, we use the "big variation" as the "good" criterion to seek the comprehensive index. 

PCA (Principal Component analysis) is not only for the dimensionality reduction of high-dimensional data, but more importantly, after reducing the dimension to remove the noise, we find the pattern in the data.

PCA replaces the original n features with a smaller number of M features, the new feature is the linear combination of the old features, which maximize the sample variance and try to make the new m features unrelated. Mapping from old features to new features captures inherent variability in data.

2.2 Understanding of PCA

  the PCA simply says that it is a generic dimension reduction tool. When we are dealing with high-dimensional data, we can reduce the complexity of subsequent computations, and in the "preprocessing" phase we usually have to dimension the original data, and the PCA is the one to do it .

Essentially, PCA is the projection of high-dimensional data into low-dimensional space through a linear transformation, but this projection is not a random cast,
Follow one guideline: Find the projection method that best represents the original data. How do you understand this idea here? "Most
Table raw Data "hope that the data after the dimensionality can not be distorted, that is, the principal reduced by the PCA can only be the noise or redundant
data. The noise and redundancy here I think can be recognized as:
Noise: We often say "noise pollution", meaning "noise" interferes with the real sound we want to hear. Similarly, suppose that a sample
The main dimension a, which can represent the original data, is "what we really want to hear", which itself contains the "energy" (that is, the dimension
Variance, why? Don't worry, after the article should explain the time has ~) should have been very large, but because it and other dimensions have that
These interrelated dimensions, the energy is weakened, and we want to go through the PCA
so that the correlation between dimension A and other dimensions is reduced as much as possible to restore the energy that dimension a has, so that we "hear more clearly
Chu! "
redundancy: Redundancy is redundant meaning, that is, it does not have it all the same, put is to occupy the place. Similarly, if the sample has
dimensions are not noticeable on all samples (extreme: the dimension is equal to the same number in all samples), and
said that the variance of the dimension is close to 0, it is clear that it does not have any effect on distinguishing the different samples, the dimension is
redundant, there is no it is the same, so PCA should remove these dimensions.
The purpose of PCA is to "de-noising" and "de-redundancy". The purpose of "noise reduction" is to make the relationship between the remaining dimensions
can be small, and the purpose of "de-redundancy" is to keep the dimension of the "energy" is as large as possible variance.

First of all, what data structures do we need to be able to represent the correlations between different dimensions and the variance of each dimension? Natural non-covariance matrix is the genus. The covariance matrix measures the relationship between the dimension and the dimension, not between the sample and the sample. The elements on the main diagonal of the covariance matrix are the variances (that is, the energy) on each dimension, and the other elements are the covariance (that is, the correlation) between the 22 dimensions. We want to have the covariance matrix all have, first look at "noise reduction", so that the relationship between the different dimensions are kept as small as possible, that is, the covariance matrix non-diagonal elements are basically zero. The way to achieve this is naturally needless to say, the line generation winning is very clear--matrix diagonalization. The diagonal matrix, which is the eigenvalues of the covariance matrix, has two identities: first, it is also the new variance on each dimension, and secondly, it is the energy that each dimension should own (the concept of energy accompanies the eigenvalues). This is why we call "variance" as "energy" in the front. Perhaps the 2nd question may be in doubt, but we should be aware of the fact that, through diagonalization, the correlation between the remaining dimensions has been reduced to the weakest and has no longer been affected by "noise", so the energy should be larger than before. After reading the "noise reduction", our "redundant" is not finished yet. The diagonal covariance matrix, where the smaller new variance on the diagonal corresponds to those dimensions that are removed. So we only take those dimensions that contain larger energies (eigenvalues), and the rest of them are dropped. The essence of PCA is actually the diagonalization covariance matrix.

2.3 PCA Process

PCA process

1. The feature is centralized. That is, the data for each dimension is subtracted from the mean of that dimension. The "dimension" here refers to a feature (or attribute), after which the mean value of each dimension becomes 0. After subtracting the mean value of the column for each column, get the Matrix B:
2. Calculate the covariance matrix C for B:
3. Calculate the eigenvalues and eigenvectors of the covariance matrix C.

4. Select a feature vector corresponding to the large eigenvalues to obtain a new data set.
Three. PCA implementation

Here only some personal understanding, and did not use the terminology, just plain. Contribution rate: Each dimension of the data to differentiate the contribution of the entire data, the largest contribution rate is obviously the main component, the second largest is the secondary principal component ...

[Coef,score,latent,t2] = Princomp (x); the eigenvalues of the latent covariance matrix. score is the scoring of the main points, which means that the original X matrix is represented in the principal component space. the Coeff is a matrix of all eigenvectors of the covariance matrix v corresponding to the X-matrices, i.e., the transformation matrix or the projection matrix. using your original matrix X*coeff (:, 1:n) is the new data you want, where n is the number of dimensions you want to drop.   

1.latent: is a column vector that consists of the eigenvalues of the covariance matrix of X. Not a characteristic value of x, i.e. latent = sort (Eig (CoV (X)), ' descend ');  

  2. Score here refers to the coordinate values of the original data in the newly generated principal component space. Zscore (x) is a normalized function. That is, z-scores z = (X–mean (x))./STD (x), the two are not the same thing.

Reference documents:

http://blog.csdn.net/wangzhiqing3/article/details/12192663

Http://www.cnblogs.com/cvlabs/archive/2010/05/08/1730319.html

Http://www.cnblogs.com/zhangchaoyang/articles/2222048.html

http://blog.csdn.net/wangzhiqing3/article/details/12193131

Http://blog.sciencenet.cn/blog-265205-544681.html

Http://blog.sina.com.cn/s/blog_61c0518f0100f4mi.html

http://blog.csdn.net/s334wuchunfangi/article/details/8169928

Http://blog.sina.com.cn/s/blog_6833a4df0100pwma.html

Http://www.ilovematlab.cn/thread-54493-1-1.html

Primary knowledge of PCA data dimensionality reduction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.