The Sklearn of Python realizes PCA dimensionality reduction

Last Update:2015-08-11 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

There are numerous explanations for PCA algorithms, and here we talk about the implementation of PCA algorithm based on Sklearn module in Python. Explained Variance Cumulative contribution rate of cumulative variance contribution rate not simply understood as the interpretation of variance, it is an important index of PCA dimensionality reduction, generally select the cumulative contribution rate of about 90% of the dimension as a reference dimension of PCA dimensionality reduction. In the process of realizing the recognition algorithm, when we obtain the reference dimension of each kind of database, we can realize the dimensionality of the data by taking the maximum dimension as the dimension of each type of characteristic. Now the data to calculate the cumulative contribution rate, the algorithm is implemented as follows.

Import numpyfrom Sklearn Import decomposition# constructs cluster category data Data_set = []for k in Range (5):    arr = Numpy.random.random ([300 0,45]) for    i in Numpy.arange (0,3000):        j = i%3        arr[i,k*9+j*3:k*9+j*3+3] = arr[i,k*9+j*3:k*9+j*3+3]+k*0.25    Print Arr.shape    Data_set.append (arr) dim_set = []for cat_num in range (len (data_set)):    PCA = decomposition. PCA ()    Pca.fit (Data_set[cat_num])    # Cumulative contribution rate also known as cumulative variance contribution rate do not simply understand to explain the variance!!!     ev_list = Pca.explained_variance_    Evr_list = [] for    J in Range (Len (ev_list)):        evr_list.append (ev_list[j]/ev_list[0]) for    J in range (Len ( Evr_list):        if (evr_list[j]<0.90):            Dim = J Break    dim_set.append (Dim)

at this point, it is possible to find the dimension of dimensionality that needs processing data according to the reference dimension. Sklearn implementation of data reduction, the functions are described below.

Sklearn.decomposition.PCA (N_components=none, Copy=true, Whiten=false)

Parameter description:

N_components:
Type: int or string, default is None, all components are preserved.
assigning an int, such as N_components=1, will reduce the raw data to a single dimension.
Assigning a string, such as n_components= ' Mle ', will automatically select the number of features N to satisfy the requested variance percentage.
Significance: The number of principal components to be retained in the PCA algorithm N, that is, the number of features retained n

Copy
Type: Bool,true or FALSE, defaults to True by default.
Meaning: Indicates whether the original training data is copied one copy when the algorithm is run. If true, the value of the original training data is not changed after the PCA algorithm is run, because it is performed on a copy of the original data, and if False, the value of the original training data is changed when the PCA algorithm is run, because it is a reduced-dimension calculation on the original data.

Whiten:
Type: bool, defaults to False by default
Meaning: Whitening, so that each feature has the same variance.

Dim = max (dim_set) PCA = decomposition. PCA (N_components=dim, Copy=true, Whiten=false) for K in range (len (data_set)):    data_set[k] = Pca.fit_transform (Data _SET[K])

The Sklearn of Python realizes PCA dimensionality reduction

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The Sklearn of Python realizes PCA dimensionality reduction

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The Sklearn of Python realizes PCA dimensionality reduction

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support