Turn: The python implementation of PCA

Last Update:2018-01-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

http://blog.csdn.net/jerr__y/article/details/53188573

This article mainly refer to the following article, the text of the code is basically the second article of the Code handwritten implementation of a bit.
-PCA Explanation: http://www.cnblogs.com/jerrylead/archive/2011/04/18/2020209.html
-Python implementation: http://blog.csdn.net/u012162613/article/details/42177327

Overall code

"" "The Total code. Func: The original characteristic matrix is reduced to dimension, and Lowdatamat is returned to the new feature matrix after descending Koriyuki. Usage:lowddatamat = PCA (Datamat, K) "" "# 0 is the value ofDefZeromean(Datamat):# Find the average of each column feature meanval = Np.mean (Datamat, axis=0) NewData = datamat-meanvalReturn NewData, MeanvalDefPca (datamat,k): Newdata,meanval=zeromean (Datamat) Covmat=np.cov (NewData,rowvar=0)  #求协方差矩阵, return ndarray; if Rowvar is not 0, a column represents a sample, 0, and one row represents a sample eigvals, Eigvects=np.linalg.eig (Np.mat (Covmat))  #求特征值和特征向量, eigenvectors are placed in columns, i.e. one column represents a eigenvectors eigvalindice= Np.argsort (eigvals)  #对特征值从小到大排序 k_eigvalindice=eigvalindice[- 1:-(k+1):-< span class= "Hljs-number" >1]  #最大的k个特征值的下标 K_eigvect=eigvects[:,k_eigvalindice] # The largest k eigenvalues correspond to the eigenvector lowddatamat=newdata*k_eigvect  #低维特征空间的数据  return Lowddatamat# reconmat= (lowddatamat*k_eigvect.t) +meanval #重构数据 # return lowddatamat,reconmat

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21st
22
23

Next step to implement PCA

(0) Prepare the data first.

import numpy as np

# n-dimensional raw data, In this case, n=2. data = Np.array ([[2.5,2.4], [0.5, Span class= "Hljs-number" >0.7], [2.2, 2.9], [1.9, 2.2", [3.1, 3.0], [2.3, 2.7], [2, 1.6], [1, 1.1], [ 1.5, 1.6], [1.1, 0.9]]) print data

[[ 2.5  2.4] [ 0.5  0.7] [ 2.2  2.9] [ 1.9  2.2] [ 3.1  3. ] [ 2.3  2.7] [ 2.   1.6] [ 1.   1.1] [ 1.5  1.6] [ 1.1  0.9]]

(1) 0 of the average value

# (1)零均值化def zeroMean(dataMat): # 求各列特征的平均值 meanVal = np.mean(dataMat, axis=0) newData = dataMat - meanVal return newData, meanValnewData, meanVal = zeroMean(data)print ‘the newData is \n‘, newDataprint ‘the meanVal is \n‘, meanVal

the newData is [[ 0.69  0.49] [-1.31 -1.21] [ 0.39  0.99] [ 0.09  0.29] [ 1.29  1.09] [ 0.49  0.79] [ 0.19 -0.31] [-0.81 -0.81] [-0.31 -0.31] [-0.71 -1.01]]the meanVal is [ 1.81  1.91]

(2) Covariance matrix of the features of each dimension

# （2）求协方差矩阵，rowvar=036表示每列对应一维特征covMat = np.cov(newData, rowvar=0)print covMat# 若rowvar=1表示没行是一维特征，每列表示一个样本，显然咱们的数据不是这样的# covMat2 = np.cov(newData, rowvar=1)# print covMat2

[[ 0.61655556  0.61544444] [ 0.61544444  0.71655556]]

(3) The eigenvalues and eigenvectors of the covariance matrix in (2)

# （3）求协方差矩阵的特征值和特征向量，利用numpy中的线性代数模块linalg中的eig函数eigVals, eigVects = np.linalg.eig(np.mat(covMat))print ‘特征值为：\n‘, eigValsprint ‘特征向量为\n‘, eigVects

特征值为：[ 0.0490834   1.28402771]特征向量为[[-0.73517866 -0.6778734 ] [ 0.6778734  -0.73517866]]

In the above results:
The characteristic values are:

[ 0.0490834 1.28402771]

Feature vectors are

[[-0.73517866 -0.6778734 ]

[0.6778734 -0.73517866]]

Eigenvalue 0.0490834 corresponds to the first column of the eigenvector ( -0.73517866 0.6778734) T

(4) dimensionality reduction to K-Dimension (K < n)

# (4) preserving the main components, sorting the eigenvalues in order from large to small, selecting the largest of the K, and then using the corresponding K-eigenvectors as the eigenvector matrix of the column vectors respectively.# For example, this example preserves 1.28402771 corresponding eigenvectors ( -0.6778734-0.73517866) ^TK =1 # This example takes k = 1eigValIndice = Np.argsort (eigvals) # from small to large sort N_eigvalindice = Eigvalindice[-1:-(k+1) :-< span class= "Hljs-number" >1] # the highest value of k subscript n_eigvect = eigvects[:, N_eigvalindice] # take corresponding K eigenvector print n_eigvectprint n_ Eigvect.shapelowdatamat = Newdata*n_eigvect # data for low-dimensional feature space Reconmat = (Lowdatamat * N_eigVect.T + meanval # reconstruct data, get data after dimensionality print print

[[-0.6778734 ] [-0.73517866]](2L, 1L)

The sample points are projected onto the selected low-dimensional eigenvectors, and the result is actually used as a new feature :

[[-0.82797019] [ 1.77758033] [-0.99219749] [-0.27421042] [-1.67580142] [-0.9129491 ] [ 0.09910944] [ 1.14457216] [ 0.43804614] [ 1.22382056]]降维之后的样本:[[ 2.37125896  2.51870601] [ 0.60502558  0.60316089] [ 2.48258429  2.63944242] [ 1.99587995  2.11159364] [ 2.9459812   3.14201343] [ 2.42886391  2.58118069] [ 1.74281635  1.83713686] [ 1.03412498  1.06853498] [ 1.51306018  1.58795783] [ 0.9804046   1.01027325]]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21st
22

Samples after dimensionality reduction:

[[2.37125896 2.51870601]
[0.60502558 0.60316089]
[2.48258429 2.63944242]
[1.99587995 2.11159364]
[2.9459812 3.14201343]
[2.42886391 2.58118069]
[1.74281635 1.83713686]
[1.03412498 1.06853498]
[1.51306018 1.58795783]
[0.9804046 1.01027325]]
Original sample:
[[2.5 2.4]
[0.5 0.7]
[2.2 2.9]
[1.9 2.2]
[3.1 3.]
[2.3 2.7]
[2.1.6]
[1.1.1]
[1.5 1.6]
[1.1 0.9]]
By comparison, we can see that we have succeeded in realizing the characteristics from two dimensions to one dimension after dimensionality reduction, and then have some changes to the original data after dimensionality reduction.
We can think of eliminating part of the noise in this way (which is, of course, probably losing some of the real information).
——————————————-Split Line ———————————————————

Using Sklearn to implement PCA

Reference post: http://blog.csdn.net/u012162613/article/details/42192293

# raw Data = Np.array ([[2.5,2.4], [0.5, 0.7], [2.2, 2.9], [ 1.9, 2.2], [3.1, 3.0], [2.3, 2.7], [2, 1.6", [1, 1.1], [ Span class= "Hljs-number" >1.5, 1.6], [1.1, 0.9]]) # print data

# 好吧，就是这么简单from sklearn.decomposition import PCApca = PCA(n_components=1)new_feature = pca.fit_transform(data)print new_feature

[[-0.82797019]
[1.77758033]
[-0.99219749]
[-0.27421042]
[-1.67580142]
[-0.9129491]
[0.09910944]
[1.14457216]
[0.43804614]
[1.22382056]]

Turn: The python implementation of PCA

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Turn: The python implementation of PCA

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Turn: The python implementation of PCA

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support