A little learning summary of PCA algorithm

Source: Internet
Author: User

The source of this article: Http://blog.csdn.net/xizhibei

=============================

PCA, also known as principalcomponents analysis , is a very good algorithm, according to the book:

Looking for the projection method that best represents the original data in the least mean square sense

And then his own argument is: mainly used for features of the dimensionality reduction

In addition, the algorithm also has a classic application: human face recognition. Here a little bit, nothing but the processing of the face picture of each line together as a feature vector, and then use the PAC algorithm to reduce the dimension.


The main idea of PCA is to find the direction of the spindle of the data, a new coordinate system is formed, the dimension can be lower than the original dimension, and then the data is projected from the original coordinate system to the new coordinate system, and the process of the projection can be the dimensionality reduction process.


Derivation process God horse is not to pull, recommend a courseware: Http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf, speak very specific


And then the steps of the algorithm

1. Calculate the mean m and scatter matrix s of all samples, so-called scatter matrix with covariance matrix;

2. Calculate the eigenvalues of S and then sort from large to small;

3. Select the corresponding feature vectors of the first n ' eigenvalues to make a transformation matrix e=[e1, E2, ..., en '];

4. Finally, for each of the previous n-dimensional feature vector x can be converted to n ' dimension of the new feature vector y:

y = Transpose (E) (X-M)


Finally also have to do the talent to remember live: With Python numpy do, with C do words that is nothing, too much trouble, because of NumPy not familiar, the following may be wrong, hope you greatly correct

Mat = Np.load ("data.npy") #每一行一个类别数字标记与一个特征向量data = Np.matrix (mat[:,1:]) Avg = np.average (data,0) means = Data-avgtmp = NP. Transpose (means) * means/n #N为特征数量D, V = Np.linalg.eig (tmp) #DV分别相应特征值与特征向量组成的向量, it should be noted that the result is self-ordered, again worship NumPy  Otl#print v#print DE = v[0:100,:] #这里仅仅是简单取前100维数据, the actual situation can be considered to take the first 80% and the like y = Np.matrix (E) * Np.transpose (means) # Get the feature vector Np.save ("final", y) after dimensionality reduction


In addition, the need to mention is OPENCV (omnipotent OpenCV ah OTL) has the implementation of PCA:

void CVCALCPCA (const cvarr* data,//input Data cvarr* AVG,//average (output) cvarr* eigenvalues,//eigenvalue (output) cvarr* eigenvectors,//feature vector (output) int flags);//How the eigenvectors in the input data are placed, for example Cv_pca_data_as_row


Finally, the disadvantage of PCA is thatPCA treats all samples (eigenvector sets) as a holistic approach, looking for an optimal linear mapping projection with the least meaning of mean square error, ignoring the class attribute, and the projection direction it ignores may just include important information about the classification.


Well, finally--well, no, it's the end.

Highly recommended: An article that can make PAC very thorough " characteristic vector physical meaning":http://blog.sina.com.cn/s/blog_49a1f42e0100fvdu.html






A little learning summary of PCA algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.