A little learning summary of PCA algorithm

Last Update:2015-05-09 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The source of this article: Http://blog.csdn.net/xizhibei

=============================

PCA, also known as principalcomponents analysis , is a very good algorithm, according to the book:

Looking for the projection method that best represents the original data in the least mean square sense

And then his own argument is: mainly used for features of the dimensionality reduction

In addition, the algorithm also has a classic application: human face recognition. Here a little bit, nothing but the processing of the face picture of each line together as a feature vector, and then use the PAC algorithm to reduce the dimension.

The main idea of PCA is to find the direction of the spindle of the data, a new coordinate system is formed, the dimension can be lower than the original dimension, and then the data is projected from the original coordinate system to the new coordinate system, and the process of the projection can be the dimensionality reduction process.

Derivation process God horse is not to pull, recommend a courseware: Http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf, speak very specific

And then the steps of the algorithm

1. Calculate the mean m and scatter matrix s of all samples, so-called scatter matrix with covariance matrix;

2. Calculate the eigenvalues of S and then sort from large to small;

3. Select the corresponding feature vectors of the first n ' eigenvalues to make a transformation matrix e=[e1, E2, ..., en '];

4. Finally, for each of the previous n-dimensional feature vector x can be converted to n ' dimension of the new feature vector y:

y = Transpose (E) (X-M)

Finally also have to do the talent to remember live: With Python numpy do, with C do words that is nothing, too much trouble, because of NumPy not familiar, the following may be wrong, hope you greatly correct

Mat = Np.load ("data.npy") #每一行一个类别数字标记与一个特征向量data = Np.matrix (mat[:,1:]) Avg = np.average (data,0) means = Data-avgtmp = NP. Transpose (means) * means/n #N为特征数量D, V = Np.linalg.eig (tmp) #DV分别相应特征值与特征向量组成的向量, it should be noted that the result is self-ordered, again worship NumPy  Otl#print v#print DE = v[0:100,:] #这里仅仅是简单取前100维数据, the actual situation can be considered to take the first 80% and the like y = Np.matrix (E) * Np.transpose (means) # Get the feature vector Np.save ("final", y) after dimensionality reduction

In addition, the need to mention is OPENCV (omnipotent OpenCV ah OTL) has the implementation of PCA:

void CVCALCPCA (const cvarr* data,//input Data cvarr* AVG,//average (output) cvarr* eigenvalues,//eigenvalue (output) cvarr* eigenvectors,//feature vector (output) int flags);//How the eigenvectors in the input data are placed, for example Cv_pca_data_as_row

Finally, the disadvantage of PCA is thatPCA treats all samples (eigenvector sets) as a holistic approach, looking for an optimal linear mapping projection with the least meaning of mean square error, ignoring the class attribute, and the projection direction it ignores may just include important information about the classification.

Well, finally--well, no, it's the end.

Highly recommended: An article that can make PAC very thorough " characteristic vector physical meaning":http://blog.sina.com.cn/s/blog_49a1f42e0100fvdu.html

A little learning summary of PCA algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

A little learning summary of PCA algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

A little learning summary of PCA algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support