"Python" uses Python for principal component analysis

Last Update:2018-08-15 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Principal component analysis is performed using PCA classes in the Sklearn library.

Import the library you want to use, and the direct PIP installation is OK.

from sklearn.decomposition import PCAimport numpy as np # 如果使用numpy的array作为参数的数据结构就需要，其他type没试过是否可以import pandas as pd # 非必要

The main input parameters of the PCA class are as follows:

n_components: This parameter can help us specify the number of feature dimensions that you want the PCA to be reduced to.
- The most common practice is to directly specify the number of dimensions to dimension, at which point the n_components is an integer greater than or equal to 1.
- It is also possible to specify the variance of the principal component and the minimum proportional threshold , allowing the PCA class to determine the number of dimensions to be reduced to the dimension according to the sample characteristic variance, at which point the n_components is one (0,1].
- The parameter can also be set to "Mle", at this time the PCA class will use the MLE algorithm according to the variance distribution of the characteristics of their own to choose a certain number of principal component features to reduce the dimension.
- You can also use the default value , that is, do not enter n_components, at which point the n_components=min (number of samples, number of features).
whiten : Determine if whitening is performed. The so-called whitening, that is, the data after the dimensionality of each feature is normalized, so that the variance is 1. For PCA dimensionality reduction itself, bleaching is generally not required. If you have a follow-up data processing action after the PCA has been reduced, you can consider whitening. The default value is False, which means no whitening.
Svd_solver: That is to specify singular value decomposition SvD method, because the feature decomposition is singular value decomposition SVD A special case, the general PCA Library is based on the SVD implementation. There are 4 values to choose from: {' auto ', ' full ', ' arpack ', ' randomized '}.
- randomized generally applicable to the large data size, the data dimension and the number of principal components in a lower proportion of PCA dimensionality, it uses a number of random algorithms to accelerate SVD.
- Full is the traditional SVD, using the corresponding implementation of the SCIPY library.
- Arpack and randomized similar to the applicable scenario, the difference is that randomized uses scikit-learn own SVD implementation, and Arpack directly uses scipy the library sparse SVD implementation.
- The default is auto, that is, the PCA class will go through the three algorithms mentioned above to weigh, choose a suitable SVD algorithm to reduce dimensionality. In general, using the default value is sufficient.

In addition to these input parameters, there are two members of the PCA class that deserve attention.

The first is Explained_variance_, which represents the variance value of each principal component after descending dimension. The larger the variance value, the more important the principal component is.

The second is Explained_variance_ratio_, which represents the ratio of the variance of each principal component after descending dimension to the total variance value, the larger the ratio, the more important the principal component.

Since my data is in the DATAFRAME data structure, I first extracted it into an array of numpy.

X_pca=all.loc[:,emotion]X_pca=np.array(X_pca)a=PCA(n_components=3) # 设置降维后的特征数目a.fit(X_pca) # 传入我们的数据X_new=a.transform(X_pca) # 得到降维后的新数据，仍然是numpy的array形式print(a.explained_variance_ratio_) # 查看降维后的各主成分的方差值占总方差值的比例print(a.explained_variance_) #查看降维后的各主成分的方差值

"Python" uses Python for principal component analysis

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

"Python" uses Python for principal component analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

"Python" uses Python for principal component analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support