Principal component Analysis PCA study notes

Last Update:2018-07-26 Source: Internet

Author: User

Tags in python

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Principal component Analysis (principal components ANALYSIS,PCA) is a simple machine learning algorithm, the main idea is to reduce the dimension of high-dimensional data processing, to remove redundant information and noise in the data.
Algorithm:
Input sample: D={x1,x2,⋯,xm} d=\left \{x_{1},x_{2},\cdots, x_{m}\right \}
The dimension of low latitude space

Process: •
1: All samples are centralized: Xi←xi−1m∑mi=1xi x_i\leftarrow x_i-\frac{1}{m}\sum_{i=1}^{m}x_i;
2: Calculate covariance matrix for all samples: XXT xx^t;
3: The covariance matrix XXT xx^t do eigenvalue decomposition;
4: Take the maximum d′{d} ' eigenvalues to do the corresponding eigenvector w1,w2,⋯,wd′w_1,w_2,\cdots, W_{d '}.
Output: Projection matrix w= (w1,w2,⋯,wd′) w= (W_1,w_2,\cdots, W_{d '})
PCA algorithm is mainly used in image compression, image fusion, human face recognition: PCA

The interface for PCA is given in Python's Sklearn package:

From sklearn.decomposition import PCA
import numpy as NP

x=np.array ([[ -1,-1],[-2,-1],[-3,-2],[1,1],[2,1],[ 3,2]])
#pca =PCA (n_components=2)
pca=pca (n_components= ' mle ')
pca.fit (X)
print (pca.explained_ Variance_ratio_)

Test and test with a dataset of your own making
The program extracts a characteristic value to reduce the dimensionality of two-dimensional data

Using PCA algorithm to reduce dimension of testSet.txt data set

Import NumPy as NP import Matplotlib.pyplot as Plt def loaddataset (filename, delim= ' \ t '): FR = open (filename) S
    Tringarr = [Line.strip (). Split (Delim) for line in Fr.readlines ()] Datarr = [Map (float, line) for line in Stringarr] Return Np.mat (Datarr) def PCA (Datamat, topnfeat=9999999): Meanvals = Np.mean (Datamat, axis=0) meanremoved = da Tamat-meanvals # remove Mean Covmat = Np.cov (meanremoved, rowvar=0) # Look for the most variance direction a,var (a ' X) =a ' cov (x) a direction error Max EIGV
    ALS, eigvects = Np.linalg.eig (Np.mat (covmat)) Eigvalind = Np.argsort (eigvals) # Sort, sort goes smallest to largest Eigvalind = eigvalind[:-(topnfeat + 1): -1] # cut off unwanted dimensions redeigvects = eigvects[:, Eigvalind] # R
    Eorganize Eig vects Largest to smallest Lowddatamat = meanremoved * redeigvects # Transform data into new dimensions Reconmat = (Lowddatamat * redeigvects.t) + meanvals return Lowddatamat, Reconmat Datamat = Loaddataset (' TestSet . txt ') print (Datamat) LoWdmat, Recomat = PCA (Datamat, 1) Print (U ' eigenvalue is: ') print (lowdmat) print (U ' eigenvectors are: ') print (recomat) FIG = plt.figure () ax = f Ig.add_subplot (111) Ax.scatter (Np.array (datamat[:, 0]), Np.array (datamat[:, 1]), marker= ' ^ ', s=90) Ax.scatter ( Np.array (recomat[:, 0]), Np.array (recomat[:, 1]), marker= ' O ', s=50, c= ' Red ') plt.show () def replacenanwithmean (): DA Tmat = Loaddataset (' secom.data ', ') numfeat = Np.shape (Datmat) [1] for I in range (numfeat): Meanval = NP. Mean (Datmat[np.nonzero (~np.isnan (datmat[:, i). A)) [0], I]) Datmat[np.nonzero (Np.isnan (datmat[:, I]. A)) [0], I] = meanval return Datmat Datamat = Replacenanwithmean () meanvals = Np.mean (Datamat, axis=0) meanremoved = 
Datamat-meanvals # remove Mean Covmat = Np.cov (meanremoved, rowvar=0) eigvals, eigvects = Np.linalg.eig (Np.mat (Covmat)) Eigvalind = Np.argsort (eigvals) # Sort, sort goes smallest to largest eigvalind = eigvalind[::-1] # reverse SORTEDEIGVA ls = eigvals[eigvalind] total = SUM (sortedeigvals) VARpercentage = sortedeigvals/total * 100 # Calculates principal component Variance fig = plt.figure () ax = Fig.add_subplot (111) Ax.plot (range (1,%), Var PERCENTAGE[:20], marker= ' ^ ') plt.xlabel (' Principal Component number ') Plt.ylabel (' Percentage of Variance ') plt.show ()

Results:
The blue triangle is the original data, the red circle is the main direction of the data, and you can see the PCA algorithm to find the main direction of the data well.
Human Face Recognition:

Att_faces contains 40 faces, each face 10 92*112 pixels grayscale Photo Data set

Here is an example of the att_faces data set:

Import OS import operator from numpy import * Import matplotlib.pyplot as PLT Import Cv2 # define PCA def PCA (DATA,K): data = float32 (data) Rows,cols = data.shape# Fetch size Data_mean = mean (data,0) Data_mean_all = Tile (Data_mean , (rows,1)) Z = data-data_mean_all# Center T1 = z*z.t #计算样本的协方差 d,v = Linalg.eig (T1) #特征值与特征向量 V1 = v[:,0:k]#

    Take the first k eigenvectors V1 = Z.t*v1 for I in range (k): #特征向量归一化 L = Linalg.norm (V1[:,i]) v1[:,i] = v1[:,i]/l  Data_new = z*v1 # data return data_new,data_mean,v1# training result after descending dimension #covert image to Vector def img2vector (filename): img = Cv2.imread (filename,0) #读取图片 rows,cols = Img.shape imgvector = Zeros ((1,rows*cols)) #create a none vectore:to r Aise Speed Imgvector = Reshape (img, (1,rows*cols)) #change img from 2D to 1D return imgvector #load dataSet def Lo Addataset (k): #choose K (0-10) people as traintest for everyone # #step 1:getting data Set Print ("--getting data S ET---") #note to use '/' not ' \ ' Datasetdir =

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More