The Sklearn of Python realizes PCA dimensionality reduction

Source: Internet
Author: User

There are numerous explanations for PCA algorithms, and here we talk about the implementation of PCA algorithm based on Sklearn module in Python. Explained Variance Cumulative contribution rate of cumulative variance contribution rate not simply understood as the interpretation of variance, it is an important index of PCA dimensionality reduction, generally select the cumulative contribution rate of about 90% of the dimension as a reference dimension of PCA dimensionality reduction. In the process of realizing the recognition algorithm, when we obtain the reference dimension of each kind of database, we can realize the dimensionality of the data by taking the maximum dimension as the dimension of each type of characteristic. Now the data to calculate the cumulative contribution rate, the algorithm is implemented as follows.

Import numpyfrom Sklearn Import decomposition# constructs cluster category data Data_set = []for k in Range (5):    arr = Numpy.random.random ([300 0,45]) for    i in Numpy.arange (0,3000):        j = i%3        arr[i,k*9+j*3:k*9+j*3+3] = arr[i,k*9+j*3:k*9+j*3+3]+k*0.25    Print Arr.shape    Data_set.append (arr) dim_set = []for cat_num in range (len (data_set)):    PCA = decomposition. PCA ()    Pca.fit (Data_set[cat_num])    # Cumulative contribution rate also known as cumulative variance contribution rate do not simply understand to explain the variance!!!     ev_list = Pca.explained_variance_    Evr_list = [] for    J in Range (Len (ev_list)):        evr_list.append (ev_list[j]/ev_list[0]) for    J in range (Len ( Evr_list):        if (evr_list[j]<0.90):            Dim = J Break    dim_set.append (Dim)

at this point, it is possible to find the dimension of dimensionality that needs processing data according to the reference dimension. Sklearn implementation of data reduction, the functions are described below.


Sklearn.decomposition.PCA (N_components=none, Copy=true, Whiten=false)

Parameter description:


N_components:
Type: int or string, default is None, all components are preserved.
assigning an int, such as N_components=1, will reduce the raw data to a single dimension.
Assigning a string, such as n_components= ' Mle ', will automatically select the number of features N to satisfy the requested variance percentage.
Significance: The number of principal components to be retained in the PCA algorithm N, that is, the number of features retained n

Copy
Type: Bool,true or FALSE, defaults to True by default.
Meaning: Indicates whether the original training data is copied one copy when the algorithm is run. If true, the value of the original training data is not changed after the PCA algorithm is run, because it is performed on a copy of the original data, and if False, the value of the original training data is changed when the PCA algorithm is run, because it is a reduced-dimension calculation on the original data.

Whiten:
Type: bool, defaults to False by default
Meaning: Whitening, so that each feature has the same variance.

Dim = max (dim_set) PCA = decomposition. PCA (N_components=dim, Copy=true, Whiten=false) for K in range (len (data_set)):    data_set[k] = Pca.fit_transform (Data _SET[K])


Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

The Sklearn of Python realizes PCA dimensionality reduction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.