"Python" uses Python for principal component analysis

Source: Internet
Author: User

Principal component analysis is performed using PCA classes in the Sklearn library.

Import the library you want to use, and the direct PIP installation is OK.

from sklearn.decomposition import PCAimport numpy as np # 如果使用numpy的array作为参数的数据结构就需要,其他type没试过是否可以import pandas as pd # 非必要

The main input parameters of the PCA class are as follows:

  • n_components: This parameter can help us specify the number of feature dimensions that you want the PCA to be reduced to.
    • The most common practice is to directly specify the number of dimensions to dimension, at which point the n_components is an integer greater than or equal to 1.
    • It is also possible to specify the variance of the principal component and the minimum proportional threshold , allowing the PCA class to determine the number of dimensions to be reduced to the dimension according to the sample characteristic variance, at which point the n_components is one (0,1].
    • The parameter can also be set to "Mle", at this time the PCA class will use the MLE algorithm according to the variance distribution of the characteristics of their own to choose a certain number of principal component features to reduce the dimension.
    • You can also use the default value , that is, do not enter n_components, at which point the n_components=min (number of samples, number of features).
  • whiten : Determine if whitening is performed. The so-called whitening, that is, the data after the dimensionality of each feature is normalized, so that the variance is 1. For PCA dimensionality reduction itself, bleaching is generally not required. If you have a follow-up data processing action after the PCA has been reduced, you can consider whitening. The default value is False, which means no whitening.
  • Svd_solver: That is to specify singular value decomposition SvD method, because the feature decomposition is singular value decomposition SVD A special case, the general PCA Library is based on the SVD implementation. There are 4 values to choose from: {' auto ', ' full ', ' arpack ', ' randomized '}.
    • randomized generally applicable to the large data size, the data dimension and the number of principal components in a lower proportion of PCA dimensionality, it uses a number of random algorithms to accelerate SVD.
    • Full is the traditional SVD, using the corresponding implementation of the SCIPY library.
    • Arpack and randomized similar to the applicable scenario, the difference is that randomized uses scikit-learn own SVD implementation, and Arpack directly uses scipy the library sparse SVD implementation.
    • The default is auto, that is, the PCA class will go through the three algorithms mentioned above to weigh, choose a suitable SVD algorithm to reduce dimensionality. In general, using the default value is sufficient.

In addition to these input parameters, there are two members of the PCA class that deserve attention.

The first is Explained_variance_, which represents the variance value of each principal component after descending dimension. The larger the variance value, the more important the principal component is.

The second is Explained_variance_ratio_, which represents the ratio of the variance of each principal component after descending dimension to the total variance value, the larger the ratio, the more important the principal component.

Since my data is in the DATAFRAME data structure, I first extracted it into an array of numpy.

X_pca=all.loc[:,emotion]X_pca=np.array(X_pca)a=PCA(n_components=3) # 设置降维后的特征数目a.fit(X_pca) # 传入我们的数据X_new=a.transform(X_pca) # 得到降维后的新数据,仍然是numpy的array形式print(a.explained_variance_ratio_) # 查看降维后的各主成分的方差值占总方差值的比例print(a.explained_variance_) #查看降维后的各主成分的方差值

"Python" uses Python for principal component analysis

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.