PCA method in descending dimension algorithm

Source: Internet
Author: User

1 Principal component Analysis
(Principal Component ANALYSIS,PCA)

2 linear discriminant Analysis
(Linear discriminant analysis, LDA)
Research background
Introduction to basic knowledge
Introduction to the Classic method
Summary discussion
The question is raised

Geographic systems are complex systems with multiple elements. In the study of geography, multivariate problems are often encountered. Too many variables will undoubtedly increase the difficulty and complexity of the analysis problem, and in many practical problems, there is a certain correlation between multiple variables.

Therefore, it is natural to think, can be based on the correlation analysis, with fewer new variables instead of the original more old variables, and so that these fewer new variables as much as possible to retain the original variables reflected by the information?

The motive of descending dimension
The sample in the original observation space has great information redundancy
The high dimension of the sample raises the "dimensionality catastrophe" of the classifier design
Task requirements such as data visualization, feature extraction, classification and clustering


After the analysis of the wish ingredient, the original 17 variables were replaced with three variables with 97.4% accuracy.
Linear Dimension Reduction
dimensionality reduction by linear combination of features
is essentially projecting the data into a low-dimensional subspace
The linear method is relatively simple and easy to calculate
Representative method
Principal component Analysis (PCA)
Linear discriminant Analysis (LDA)
Multidimensional scale transformation (MDS)
Principal component Analysis (PCA) [Jolliffe, 1986]
dimensionality reduction Objective: To find the best projection subspace to maintain the variance of sampled data
Solution method: The eigenvalue decomposition of the sample divergence matrix, the subspace is the subspace which is the direction of the characteristic vector with the maximal eigenvalue.
Principal component Analysis (PCA) [Jolliffe, 1986]
PCA has a very good effect on the sample set of ellipsoidal distribution, and the main direction of the learning is the direction of the ellipsoidal axis.
PCA is an unsupervised algorithm that can be found to represent the direction of all samples well, but this direction is not necessarily the most advantageous for classification.

Linear discriminant Analysis (LDA) [Fukunaga, 1991]
dimensionality reduction Objective: To find the projection line which can separate the two kinds of samples, so that the difference between the mean values of the two kinds of samples and the ratio of the total divergence of the projection sample is maximal.
Solution method: It is deduced that the original problem is transformed into the generalized eigenvalue problem of the divergence matrix and the total inter-class dispersion matrix in the sample lumped class.

Comparison of linear dimensionality reduction methods
Principal component Analysis (PCA) [Jolliffe, 1986]
Linear discriminant Analysis (LDA) [Fukunaga, 1991]

The deficiency of linear dimensionality reduction method

Primitive data cannot be represented as a simple linear combination of features
For example: PCA cannot express helix curve manifolds

I. Basic principles of principal component analysis

Assuming that there are n geographic samples, each sample has p variables, which form the geographic data matrix of the NXP order.

When P is large, it is troublesome to investigate the problem in the P-dimensional space. In order to overcome this difficulty, it is necessary to carry out dimensionality reduction treatment, that is, to replace the original more variable index with fewer comprehensive indexes, and to make these less comprehensive indexes reflect the information reflected by the original more variable index as much as possible, and they are independent of each other.
Definition: Remember X1,X2,...,XP as the original variable indicator, Z1,Z2,...,ZM (m≤p) as the new variable indicator

Determining principle of coefficient LIJ:
①zi and ZJ (i≠j;i,j=1,2,...,m) are not related to each other;
②Z1 is the x1,x2,...,xp of all the linear combination of the biggest difference, Z2 is not related to z1 all linear combination of X1,X2,...,XP the largest of the Chinese;
......
ZM is not related to the z1,z2,......,zm-1 of X1,X2,...XP, all linear combination of the biggest difference between the Chinese.

Then the new variable indicator Z1,Z2,...,ZM is called the original variable indicator X1,X2,...,XP the first, second, ..., the main component m.
From the above analysis, it can be seen that the essence of principal component analysis is to determine the original variable XJ (j=1,2, ..., p) on the main component Zi (i=1,2,...,m) load Lij (i=1,2,...,m; j=1,2, ..., p).
Mathematically it is easy to know, mathematically, that they are characteristic vectors corresponding to the M-large eigenvalues of the correlation matrix, respectively.

1) construct the variable matrix of the P*n order

2) standardize each row of the variable matrix X of the P*n order (representing an attribute field)

3) Finding the covariance matrix C

4) Finding the eigenvalues and corresponding eigenvectors of the covariance matrix

5) The eigenvector is arranged into a matrix by the corresponding eigenvalue size from top to bottom, and the first k column is composed of the Matrix p.

6) Y=XP is the data after dimensionality reduction to K dimension

PCA method in descending dimension algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.