A learning Summary of PCA Algorithms

Source: Internet
Author: User

Source: http://blog.csdn.net/xizhibei

==================================

PCA, that is, principalcomponents analysis, is a very good algorithm, according to the book:

Find the projection method that best represents the original data in the sense of least square.

Then I said: it is mainly used for feature dimensionality reduction.

In addition, this algorithm also has a classic application: face recognition. Here, we just need to take each line of the processed face image together as the feature vector, and then use the PAC algorithm to reduce the dimension.


The main idea of PCA is to find the direction of the Data spindle, which forms a new coordinate system. The dimension here can be lower than the original dimension, and then the data is projected from the original coordinate system to the new coordinate system, this projection process can be a dimension reduction process.


We recommend a Courseware: Workshop.


Then let's talk about the algorithm steps.

1. Calculate the mean m of all samples and the distribution matrix S. The so-called distribution matrix is the same as the covariance matrix;

2. Calculate the feature values of S and sort them in ascending order;

3. Select the feature vectors corresponding to the first n feature values to form a transformation matrix E = [E1, E2 ,..., En '];

4. Finally, for each n-dimensional feature vector X can be converted to n-dimensional new feature vector Y:

Y = transpose (E) (X-m)


Finally, I have to do it myself to remember: I did it with Python numpy. If I do it with C, it's okay to look for things. It's too troublesome. Due to not familiar with numpy, there may be errors below, I hope you will be able to make your corrections.

MAT = NP. load ("data. YY ") # mark each row with a category number and a feature vector data = NP. matrix (MAT [:, 1:]) AVG = NP. average (data, 0) means = data-avgtmp = NP. transpose (means) * means/N # N indicates the number of features D, V = NP. linalg. EIG (TMP) # DV is a vector composed of corresponding feature values and feature vectors. Note that the result is automatically sorted, worship numpy OTL # print V # print de = V [0: 100,:] # here is simply to retrieve the first 80% dimensions of data, the actual situation can consider taking the first and other y = NP. matrix (e) * NP. transpose (means) # obtain the feature vector NP after dimensionality reduction. save ("final", Y)


In addition, we need to mention the implementation of PCA in opencv (omnipotent opencv Ah OTL:

Void cvcalcpca (const cvarr * data, // input cvarr * AVG, // average (output) cvarr * eigenvalues, // feature value (output) cvarr * eigenvectors, // feature vector (output) int flags); // how to put the feature vector in the input data, for example, cv_pca_data_as_row


Finally, let's talk about the disadvantages of PCA: PCA treats all samples (feature vector sets) as a whole to find the optimal linear ing projection with the minimum mean square error, the class attribute is ignored, and the Projection Direction ignored by the class attribute may just include important discriminative information.


Well, the last one -- okay, no, it's the last one.

Strongly recommended: a very thorough article about the physical significance of feature vectors: http://blog.sina.com.cn/s/blog_49a1f42e0100fvdu.html






Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.