[Python Machine learning and Practice (6)] Sklearn Implementing principal component Analysis (PCA)

Source: Internet
Author: User
Tags svm

1.PCA principle

Principal component Analysis (Principal Component ANALYSIS,PCA) is a statistical method. An orthogonal transformation transforms a set of variables that may be related to a set of linearly unrelated variables, and the transformed set of variables is called the principal component.

PCA algorithm:

Implementation of the 2.PCA

Data set:

64-D handwritten digital images

Code:

#Coding=utf-8ImportNumPy as NPImportPandas as PD fromSklearn.decompositionImportPCA fromMatplotlibImportPyplot as Plt fromSklearn.svmImportlinearsvc fromSklearn.metricsImportClassification_report#1. Initializes a linear matrix and evaluates the rankM = Np.array ([[1,2],[2,4]])#Initializes a linear correlation matrix of 2*2Np.linalg.matrix_rank (M,tol=none)#calculate the rank of a matrix#2. Read the training data and test data set. Digits_train = Pd.read_csv ('Https://archive.ics.uci.edu/ml/machine-learning-databases/optdigits/optdigits.tra', header=None) Digits_test= Pd.read_csv ('https://archive.ics.uci.edu/ml/machine-learning-databases/optdigits/optdigits.tes', header=None)PrintDigits_train.shape#(3823, 65) 3000+ samples, each data is composed of 64 characters, 1 labelsPrintDigits_test.shape#(1797, +)#3 Dimension data to 2 dimensions and visualize#3.1 feature vectors and markers for split training dataX_digits = Digits_train[np.arange (64)]#get a 64-bit characteristic valueY_digits = digits_train[64]#get the corresponding label#3.2 PCA dimensionality reduction: to 2 DEstimator = PCA (n_components=2) X_PCA=estimator.fit_transform (x_digits)#3.3 shows the 2-dimensional spatial distribution of these 10-type handwritten digital images after PCA compressiondefplot_pca_scatter (): Colors= ['Black','Blue','Purple','Yellow',' White','Red','Lime','Cyan','Orange','Gray']     forIinchxrange (len (colors)): PX= x_pca[:, 0][y_digits.as_matrix () = =i] py= x_pca[:, 1][y_digits.as_matrix () = =i] plt.scatter (px, py, c=Colors[i]) plt.legend (np.arange (0,10). Astype (str)) Plt.xlabel ('First Principal Component') Plt.ylabel ('Second Principal Component') Plt.show () Plot_pca_scatter ()#4. Using SVM to train the data of the original space (64 dimensions) and the data down to 20 dimensions, predict#4.1 Separation of feature vectors from classification labels for training data/test dataX_train = Digits_train[np.arange (64)]y_train= digits_train[64]x_test= Digits_test[np.arange (64)]y_test= digits_test[64]#4.2 Training of 64-dimensional data with SVMsvc = Linearsvc ()#classifier of support vector machine for initializing linear kernelSvc.fit (x_train,y_train) y_pred=svc.predict (x_test)#4.3 Training of 20-dimensional data with SVMEstimator = PCA (n_components=20)#use PCA to compress the original 64-dimensional image to 20 dimensionsPca_x_train = Estimator.fit_transform (X_train)#using training characteristics to determine the direction of 20 orthogonal dimensions and transform the original training characteristicsPca_x_test =estimator.transform (x_test) psc_svc=linearsvc () psc_svc.fit (pca_x_train,y_train) pca_y_pred=psc_svc.predict (pca_x_test)#5. Get the Results report#output with 64-dimensional training resultsPrintSvc.score (x_test,y_test)PrintClassification_report (Y_test,y_pred,target_names=np.arange (10). Astype (str))#output with 20-dimensional training resultsPrintPsc_svc.score (pca_x_test,y_test)PrintClassification_report (Y_test,pca_y_pred,target_names=np.arange () Astype (str))

Operation Result:

1) Compress the data into two dimensions and visualize the two-dimensional plane.

2) Training results of SVM on 64 and 20 dimensional data

0.9220923761825265
Precision Recall F1-score Support

0 0.99 0.98) 0.99 178
1 0.97 0.76) 0.85 182
2 0.99 0.98) 0.98 177
3 1.00 0.87) 0.93 183
4 0.95 0.97) 0.96 181
5 0.90 0.97) 0.93 182
6 0.99 0.97) 0.98 181
7 0.99 0.90) 0.94 179
8 0.67 0.97) 0.79 174
9 0.90 0.86) 0.88 180

Avg/total 0.94 0.92 0.92 1797

0.9248747913188647
Precision Recall F1-score Support

0 0.97 0.96) 0.96 178
1 0.88 0.90) 0.89 182
2 0.96 0.99) 0.97 177
3 0.99 0.91) 0.95 183
4 0.92 0.96) 0.94 181
5 0.87 0.96) 0.91 182
6 0.98 0.97) 0.98 181
7 0.98 0.89) 0.93 179
8 0.91 0.83) 0.86 174
9 0.83 0.88) 0.85 180

Avg/total 0.93 0.92 0.93 1797

Conclusion: The accuracy of descending dimension decreases, but it uses less dimension.

Advantages and disadvantages of 3.PCA

The main advantages of PCA algorithm are:

1) Only the variance is required to measure the amount of information, not affected by factors other than the data set.

2) orthogonal between the main components, can eliminate the interaction between the original data components of the factors.

3) Calculation method is simple, the main operation is eigenvalue decomposition, easy to achieve.

The main drawbacks of PCA algorithms are:

1) The meaning of each characteristic dimension of principal component has certain fuzziness, which is not better than the interpretation of original sample characteristics.

2) The non-principal component with small variance may also contain important information about the sample difference, because the reduced dimension discard may affect the subsequent data processing.

[Python Machine learning and Practice (6)] Sklearn Implementing principal component Analysis (PCA)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.