the necessity of dimensionality reduction
1. Multi-collinearity--predictor variables are interconnected. Multiple collinearity causes instability in the solution space, which can lead to incoherent results.
2. The high-dimensional space itself is sparse. One-dimensional normal distribution has a value of 68% falling between the positive and negative standard deviations, and only 0.02% in 10-dimensional space.
3. Too many variables can hinder the establishment of a search rule.
4. Analysis
diagonalization. eigenvector u orthogonal:Second, PCA the essence (covariance matrix diagonalization, symmetric matrix feature decomposition)When the amount of data is too large, dimensionality needs to be reduced. How to drop it? The need to ensure that the dimension is reduced, but at the same time the amount of information retained at the most, the conversion into mathematical terms is the variance of each row vector as large as possible (variance
overfitting.
the idea of PCAThe n-dimensional features are mapped to K-dimensional (k
Maximum variance theory, least square error theory, and axis correlation degree theory
PCA Calculation ProcessLet's say we get 2-dimensional data like this:The row represents the sample, the column represents the feature , there are 10 samples, and two characteristics for each sample.The first step is to find the average of x and Y respectively, a
Transferred from ice1020502
First of all, let alone all the existing statements. If I say it again in my own words, it may be a bit chilly.
The main purpose of PCA is to reduce dimensionality. Several Questions are involved: What is dimensionality reduction? What is the standard for dimensionality reduction? How to achieve dimensionality reduction?
Next we will discuss these three questions in sequence.
(1) What is dimensionality reduction?
principal component.Score is the scoring of the principal component, i.e. the expression of the original X matrix in the principal component space. Each row corresponds to a sample observation, and each column corresponds to a main component (variable), and its number of rows and columns is the same as the number of lines in X. (equivalent to S in the above program)A latent is a vector of eigenvalues of the covariance matrix corresponding to X. (equivalent to E in a program)Relationship between
This paper introduces the principle of PCA in detail, mainly refer to PRML book.PCA is also called Karhunen-loève transform (KL transform), or hotelling transform (hotelling transformation), is a unsupervised learning method, which is often used in dimensionality reduction of high-dimensional data, and transforms the original data into a group of linear independent representations through linear transformations. , which can be used to extract the main
There is no doubt about the function that comes with MATLAB.Princomp:principal componet Analysis (PCA).[Coeff,score,latent,tsquare]=princomp (X);Parameters:%%%%%%%%%%%%%%%%%%INPUT:X is the data: N*p, where n is the number of samples and P represents the feature dimension%%%%%%%%%%%%%%%%%%OUTPUT:Coeff: Covariance p*p, projection matrixScore: The data after the projection. If the number of Samples Phenomenon: Score (:, n:p), latent (n:p) are zero. Why i
Someone asked me before. This is actually a very simple question. You can generate a simple cascade_data script.But later I found that if I did not read it carefullyCodeAnd is very prone to problems. Here is a brief introduction to help new students.
First, the positive sample used to generate PCA statistics (no matter what the negative sample is) must be consistent with the positive sample used in the training model.That is to say, the sample path a
About PCA and LDA used in face recognition and other classificationAlgorithmThere are many examples, but they areCode, Especially for C ++ code. Therefore, I can only build C ++ Based on the Matlab code. There are still some issues with the LDA algorithm. All core code will be provided in the past two weeks. In fact, PCA, Lda, and so on are just a tool. With good tools, other more and more powerful function
PcaPCA, which is called principal component analysis, is a common method of dimensionality reduction.PCA re-combines many of the original indicators with certain correlations into a new set of unrelated comprehensive indicators to replace all the original indicators. The n-dimensional features are mapped to the new orthogonal features of K-dimension.There are two general implementations of PCA: Eigenvalue decomposition and SVD.PrincipleIn order to fin
In the past two days, I checked PCA Dimensionality Reduction and tested it with opencv. Refer to [1] and [2]. According to my understanding, the code is recorded.
#include
Result:
Eigenvalues:[43.182041; 14.599923; 9.2121401; 4.0877957; 2.8236785; 0.88751495; 0.66496396]
EIGENVECTORS[0.01278889, 0.03393811,-0.099844977,-0.13044992, 0.20732452, 0.96349025,-0.020049129;0.15659945, 0.037932698, 0.12129638, 0.89324093, 0.39454412, 0.046447847, 0.06019
[1:4,1:4] = g142 x.cov[5:8,5:8] = g243 x.cov[9:10,9:10] = g344 x.cov[1:4,9:10] = g1g345 x.cov[9:10,1:4] = t(g1g3)46 x.cov[5:8,9:10] = g2g347 x.cov[9:10,5:8] = t(g2g3)48 49 50 b = spca(x.cov, 2, type=‘Gram‘, sparse=‘varnum‘, para=c(4,4), lambda=0)51 b
The results of the population version using exact covariance matrix are exactly as in the paper:
> bCall:spca(x = x.cov, K = 2, para = c(4, 4), type = "Gram", sparse = "varnum", lambda = 0)2 sparse PCs Pct. of exp. var. : 40.9 39.5 Num. of non
Find this article on the Internet, personally feel very clear, learn.
Principle explanation of PCA algorithm
The PCA algorithm reduces the correlation between the components, but the disadvantage is that the dimensionality reduction is not conducive to classifying the data.
The first principle of the algorithm: the meaning of orthogonal basis, covariance, the purpose of diagonalization of matrices, the
Use the program in the previous post and a small face to test the PCA effect.
#include
I have a post [1] which is suitable for beginners like me. I will try again later or translate this post.
[1] http://www.cognotics.com/opencv/servo_2007_series/part_4/page_3.html
Dimensions n> Number of samples m of the matrix in face recognition.
Calculate the principal component of matrix A. According to the principle of PCA, it is to calculate the feature values and feature vectors of the covariance matrix a' A of A, But A' A may be relatively large, therefore, according to the size of a' A, the feature values of A' or A' A can be calculated. The feature values of the original Matrix and Its transpose matrix are the same
PCA reduction by the maximum variance method (Welcome discussion)On the basis of the previous article, we will continue to discuss:First, the center point of the original space is obtained:Assuming that U1 is a projection vector, the variance after the projection is:The variance is the largest (i.e., the points after the projection are scattered and have no correlation.) To achieve a good dimensionality reduction effect), using Lagrange multiplier met
deviation the squared variance = eigenvalues#----Proportion of Variance variance contribution rate#----Cumulative Proportion Variance Cumulative contribution rateThe cumulative contribution rate of the first two principal components of the #由结果显示 has reached 96% and the other two principal components can be reduced to achieve dimensionality reduction.So you can get the function expression z1=-0.497x ' 1-0.515x ' 2-0.481x ' 3-0.507x ' 4z2= 0.543X ' 1-0.210x ' 2-0.725x ' 3-0.368x ' 4#4. Plot the
Defined
The idea of PCA is to map n-dimensional features to K-Dimensions (K- background
In the machine learning process, the first step is the data processing. In most machine learning classes, in order to simplify understanding, the first few lessons are to select only 1~2 features. This leads to problems, if the characteristics of more than what to do. In the analysis of regression problems, the gradient descent method is introduced, which is esta
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.