Introduction to PCA (2)

Source: Internet
Author: User

Transferred from ice1020502

First of all, let alone all the existing statements. If I say it again in my own words, it may be a bit chilly.

The main purpose of PCA is to reduce dimensionality. Several Questions are involved: What is dimensionality reduction? What is the standard for dimensionality reduction? How to achieve dimensionality reduction?

Next we will discuss these three questions in sequence.

(1) What is dimensionality reduction?

The dimension concept is not explained here. The dimension reduction can be simply understood as: the larger the data dimension, the more complex the computing is. High-dimensional data needs to be expressed in a low dimension to meet various requirements (as to why dimensionality reduction is required, the problems may be different in different fields. In pattern recognition, the main cause may be to avoid problems such as "dimension disasters ). For example, in a two-dimensional space, the data dimension is two, and the X and Y coordinate values in the coordinate system. However, for various reasons, we need to reduce the two-dimensional data to one dimensional, that is, to project each point in the two-dimensional coordinate system to one dimensional, one axis line. So I raised two questions: the standard of dimensionality reduction and the method of dimensionality reduction, that is, what kinds of axis is the axis that the original data needs to be projected after dimensionality reduction? How to find this axis?

(2) What is the dimension reduction standard?

Using the example just now, this question is equivalent to: What kinds of axis is what meets our requirements? The simple answer is the direction of the big variance, or the direction of the feature vector corresponding to the big feature value. If two-dimensional data is reduced to one-dimensional data, the farther the data point is from the projection axis, the greater the noise is, the purpose of dimensionality reduction is to minimize the overall noise and find the direction of the maximum signal-to-noise ratio, the signal-to-noise ratio can be defined as variance ratio. For example, for an elliptical data point set, the projection axis after dimensionality reduction is the long axis of the elliptic (I think I have spent a lot of effort to illustrate this simple phenomenon, but it seems that the problem is still not simplified ).

(3) How to Achieve dimensionality reduction?

Assuming that we have proved that the direction of the feature value is the main direction of the projection, the main task of dimensionality reduction is to find the feature vector corresponding to the feature values in the ascending to smallest order, A feature transformation matrix composed of these feature vectors that projects the original data to the data space after dimensionality reduction. Let's assume this first. As for finding feature values, there is a method for directly finding feature values in the implementation and using the Singular Value Decomposition Method, which is not discussed here.

It seems that I am not very clear, right should be my own memo.

Secondly, according to other documents, the PCA steps can be summarized as follows:

(1) organize the data into M * n matrices. m is the dimension, and N is the number of samples.

(2) subtract the mean value from each dimension.

(3) Calculate feature vectors of the SVD or covariance matrix.

In fact, the steps are very simple, and the main thing is to understand the significance of doing so. Come here first, and add some details later.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.