The following is the process of using PCA to reduce the dimension of data:
The Python source code is as follows:
1 fromNumPyImport*;2 defLoaddataset (filename,delim='\ t'):3 #Open File4Fr=open (fileName);5 """6 >>> line0=fr.readlines ();7 >>> Type (LINE0)8 <class ' list ' >9 >>> Line0[0]Ten ' 10.235186\t11.321997\n ' One """ AStringarr=[line.strip (). Split (Delim) forLineinchFr.readlines ()]; - #The map function acts on each element of a given sequence and provides the return value with a list -Datarr=[list (Map (LambdaX:float (x), line)) forLineinchStringarr]; thedatamat=Mat (Datarr); - returnDatamat; - - defPcafun (datamat,topnfeat=9999999): + #calculates the average axis=0 per column, calculating the average of each row Axis=1 -Meanvals=mean (datamat,axis=0); + #go average, shape (datamat) = (+, 2), shape (meanvals) = (1, 2) AMeanremoved=datamat-meanvals;#you can operate directly on a matrix of two different dimensions at #Calculate covariance matrix shape (Covmat) = (2, 2) -Covmat=cov (meanremoved,rowvar=0); - #calculating eigenvalues and eigenvectors of covariance matrices -eigvals,eigvects=Linalg.eig (Mat (Covmat)); - #sort the eigenvalues in ascending order -Eigvalind=Argsort (eigvals); in #sort the eigenvalues in reverse order -eigvalind=eigvalind[:-(topnfeat+1): 1]; to #calculates the eigenvector of the maximum eigenvalue corresponding to +redeigvects=Eigvects[:,eigvalind]; - #calculate the data set after dimensionality reduction thelowddatamat=meanremoved*redeigvects; * #Refactoring Raw Data $Reconmat= (LOWDDATAMAT*REDEIGVECTS.T) +meanvals;Panax Notoginseng returnLowddatamat,reconmat; - the ImportMatplotlib.pyplot as PLT; + #Draw, draw the original data and the data after dimensionality reduction A defPlotData (datamat,reconmat): the #import matplotlib; + #import Matplotlib.pyplot as PLT; -fig=plt.figure (); $Ax=fig.add_subplot (111); $Ax.scatter (Datamat[:,0].flatten (). A[0],datamat[:,1].flatten (). A[0],marker='^', s=90); -Ax.scatter (Reconmat[:,0].flatten (). A[0],reconmat[:,1].flatten (). A[0],marker='o', s=50,c='Red'); -Plt.show ();
In the above code, Lowddatamat is a reduced-dimension data set, Reconmat a refactored dataset, and the data graph of the original and the reduced dimensions is as follows:
Using the PCA for data dimensionality reduction with Python