[Comprehensive] PCA dimensionality reduction

Source: Internet
Author: User

Http://blog.json.tw/using-matlab-implementing-pca-dimension-reduction

With M-pen information, each piece of material is n-dimensional, so that they can be regarded as a mxn matrix. If the size of the material is too large, it may be detrimental to analysis, such as this m-pen information used as a machine learning.

The idea of PCA is to work out the oblique variance matrices of this MXN matrix, which is NXN, which calculates this matrix n eigenvalue (eigen value) and its corresponding characteristic vector (Eigen vector), which arranges its response eigen according to the Eigen value size from small to large Vector, the first few eigen vectors are taken out as K (k< N), creating a matrix nxk. The matrix of the original information is multiplied by this matrix to get MXN x nxk = Mxk of the size of the matrix, to the effect of descending dimension, in order to lower the dimensions of the space of high dimensions.

How much is the K value to be reduced to several dimensions? The lower the dimension, the more limited the extent to which the original space can be expressed, usually to a dimension of more than 95% of the original space, to the extent that the degree of calculation is as follows, to the extent that the example is reduced to K-dimensional, the percentage of the degree of the original space is:

eigenvalue前k項和 / eigenvalue總和

Suppose Eigen value is

517.796967.496412.40540.2372

The degree to which the k=1~4 is able to express the original space

0.865970.978860.99961

A MXN Matrix

Using PCA to reduce dimensions
[eigenVector,score,eigenvalue,tsquare] = princomp(matrix);

Eigenvector and eigenvalue have been sorted, to be reduced to K-dimensional, to transform the matrix

transMatrix = eigenVector(:,1:k);

Take the new matrix = Matrix * Transmatrix

Http://blog.sina.com.cn/s/blog_616d4c030102vcz6.html

The principle of PCA is to project the original sample data into a new space, equivalent to what we learned in matrix analysis to map a set of matrices to another coordinate system. Through a transformation of coordinates, it can also be understood to convert a set of coordinates into another set of coordinate systems, but in the new coordinate system, the original is not required so many variables, only the original sample of the largest linear independent group of the eigenvalues of the corresponding space coordinates can be.

For example, the original sample is the dimension of the 30*1000000, that is, we have 30 samples, each sample has 1 million feature points, this feature point is too many, we need to the characteristics of these samples to dimensionality reduction. Then in the dimensionality of the original sample matrix of the covariance matrix, here is 1000000*1000000, of course, this matrix is too large, the calculation of the time there are other ways to deal with, here is just the basic principle, and then through this 1000000* The covariance matrix of 1000000 computes its eigenvalues and eigenvectors, and finally obtains a transformation matrix with the maximum eigenvalue. For example, our first 29 eigenvalues have been able to account for more than 99% of all eigenvalues, then we only need to extract the first 29 eigenvalues corresponding to the eigenvectors. This will constitute a 1000000*29 transformation matrix, and then multiply the original sample by this transformation matrix, you can get the original sample data in the new feature space corresponding coordinates. 30*1000000 * 1000000*29 = 30 *29, so the number of eigenvalues for each sample of the original training sample is reduced to 29.

The following is the Baidu Encyclopedia in the PCA to reduce the dimension of an explanation, or quite clear:

For a training set, 100 object templates, characterized by 10 dimensions, it is possible to create a 100*10 matrix as a sample. In order to find the covariance matrix of this sample, we get a 10*10 covariance matrix, then we can find the eigenvalues and eigenvectors of this covariance matrix, we should have 10 eigenvalues and eigenvectors, we take the characteristic vectors corresponding to the first four eigenvalues according to the size of eigenvalues, and form a 10*4 matrix. This matrix is the characteristic matrix that we require, the sample matrix of 100*10 is multiplied by this 10*4 's characteristic matrix, we get a new sample matrix of 100*4 after descending dimension, and the dimensionality of each feature drops.

[Comprehensive] PCA dimensionality reduction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.