Principle analysis of PCA algorithm for principal component analyses
Discussion on the understanding of principal component Analysis (PCA) algorithm
Principal component Analysis (PCA): dimensionality reduction .
Multiple variables are selected by linear transformation (linear addition) to select fewer important variables.
The principle of minimizing the loss of information.
Principal component: is the linear coefficient, that is, the projection direction.
Usually, there is a certain correlation between variables, that is , the information has a certain overlap . Delete the duplicate variable.
Basic idea: Move the center of the Axis to the center of the data and then rotate the axis so that the variance of the data on the C1 axis is the largest, that is, the projection of all n data individuals is most scattered in that direction. means that more information is preserved. C1 became the first principal ingredient .
C2 Second main component : Find a C2, make C2 and C1 covariance (correlation coefficient) is 0, so as not to overlap with C1 information, and make the data in the direction of the variance as far as possible maximum.
And so on, find the third principal ingredient, the fourth main ingredient .... The main component of P. P-Random variables have a P-principal component.
The eigenvalues and eigenvectors are analyzed by covariance.
Feature vectors (feature faces).
- It is a linear transformation. This transformation transforms the data into a new coordinate system so that the first generous difference of any data projection is on the first coordinate (called the First principal component ), the second generous difference is in the second coordinate (the second principal component), and so on. Principal component analysis often uses a feature that reduces the number of dimensions in a dataset while preserving the maximum contribution of the data set's difference.
- Principal component Analysis (PCA) is a statistical method of dimensionality reduction, which transforms its component-related original random vectors into new random vectors whose components are irrelevant, which is represented by transforming the covariance matrix of the original random vector into a diagonal array by means of an orthogonal transformation . The geometric representation of the original coordinate system into a new orthogonal coordinate system, so that it points to the sample point scatter the most open p orthogonal direction, and then the multidimensional variable system to reduce dimensional processing, so that it can be a higher precision conversion to a low-dimensional variable system, and by constructing the appropriate value function, Further transform the low-dimensional system into a one-dimensional system.
Definition of principal component analysis:
The nature of principal component analysis:
Selection of principal component number
With P-random variables, there is a P-principal component. Because the total variance does not increase,C1, C2, and so on, the variance of the previous composite variables is larger, and the variance of the cp,cp-1 and other complex variables is small , strictly speaking, only the former composite variables are called the main (want) components, after a few comprehensive variables are "secondary" (to) components. in practice always keep the first few, ignoring the latter few . The number of principal components retained depends on the percentage of the total variance in the sum of variances (that is, the cumulative contribution rate), which marks the amount of information that the previous principal components summarize. In practice, a rough provision of a percentage can decide to retain a few principal components; if you leave one more principal component, the cumulative variance increases little and no longer stays.
Main role of principal component analysis
- descending dimension
- Select the first two principal components or one of the two principal components, according to the main component score, draw n samples on the two-dimensional plane distribution condition, The figure can be visualized to see the status of each sample in the main component, and then the sample can be classified processing, can be seen from the graph away from most of the sample point outliers.  
-
- to filter variables with principal component analysis, you can select variables with less computational amount and get the effect of choosing the best variable quantum set.
Calculation steps for principal component analysis
The weighted sum of M principal components is the final value, and the weights are the variance contribution rate of each principal component.
Principal component Analysis (Principal, PCA)