There are numerous explanations for PCA algorithms, and here we talk about the implementation of PCA algorithm based on Sklearn module in Python. Explained Variance Cumulative contribution rate of cumulative variance contribution rate not simply understood as the interpretation of variance, it is an important index of PCA dimensionality reduction, generally select the cumulative contribution rate of about 90% of the dimension as a reference dimension of PCA dimensionality reduction. In the process of realizing the recognition algorithm, when we obtain the reference dimension of each kind of database, we can realize the dimensionality of the data by taking the maximum dimension as the dimension of each type of characteristic. Now the data to calculate the cumulative contribution rate, the algorithm is implemented as follows.
Import numpyfrom Sklearn Import decomposition# constructs cluster category data Data_set = []for k in Range (5): arr = Numpy.random.random ([300 0,45]) for i in Numpy.arange (0,3000): j = i%3 arr[i,k*9+j*3:k*9+j*3+3] = arr[i,k*9+j*3:k*9+j*3+3]+k*0.25 Print Arr.shape Data_set.append (arr) dim_set = []for cat_num in range (len (data_set)): PCA = decomposition. PCA () Pca.fit (Data_set[cat_num]) # Cumulative contribution rate also known as cumulative variance contribution rate do not simply understand to explain the variance!!! ev_list = Pca.explained_variance_ Evr_list = [] for J in Range (Len (ev_list)): evr_list.append (ev_list[j]/ev_list[0]) for J in range (Len ( Evr_list): if (evr_list[j]<0.90): Dim = J Break dim_set.append (Dim)
at this point, it is possible to find the dimension of dimensionality that needs processing data according to the reference dimension. Sklearn implementation of data reduction, the functions are described below.
Sklearn.decomposition.PCA (N_components=none, Copy=true, Whiten=false)
Parameter description:
N_components:
Type: int or string, default is None, all components are preserved.
assigning an int, such as N_components=1, will reduce the raw data to a single dimension.
Assigning a string, such as n_components= ' Mle ', will automatically select the number of features N to satisfy the requested variance percentage.
Significance: The number of principal components to be retained in the PCA algorithm N, that is, the number of features retained n
Copy
Type: Bool,true or FALSE, defaults to True by default.
Meaning: Indicates whether the original training data is copied one copy when the algorithm is run. If true, the value of the original training data is not changed after the PCA algorithm is run, because it is performed on a copy of the original data, and if False, the value of the original training data is changed when the PCA algorithm is run, because it is a reduced-dimension calculation on the original data.
Whiten:
Type: bool, defaults to False by default
Meaning: Whitening, so that each feature has the same variance.
Dim = max (dim_set) PCA = decomposition. PCA (N_components=dim, Copy=true, Whiten=false) for K in range (len (data_set)): data_set[k] = Pca.fit_transform (Data _SET[K])
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
The Sklearn of Python realizes PCA dimensionality reduction