Principal Component Analysis Algorithm advantages and disadvantages:
- Pros: Reduce data complexity and identify the most important features
- Cons: Not necessarily required, and may lose useful information
- Applicable data type: numeric data
Algorithmic thinking: The benefits of dimensionality reduction:
- Make data sets easier to use
- Reduce the computational overhead of many algorithms
- Noise removal
- Make the results understandable
The idea of principal component analysis (principal component ANALYSIS,PCA) is to transform the data into a new coordinate system, the choice of which is determined by the data itself, the first dimension is the largest direction of the difference between the original data, the second is the first Korimasa with the largest variance, has been repeated ...
The covariance matrix and eigenvalue analysis of datasets are used in principal component analysis.
Function:
pca(dataMat, topNfeat=999999)
Because principal component analysis can basically be said to be a matrix problem, and numpy in this area to help us do, so the function is very simple, is to go to the average, and then calculate the covariance matrix and its eigenvalues, and finally select the largest topnfeat, and finally use these eigenvectors to transfer the source data to a new space. Of course, there are two ways to use it, one is to limit the number and the other is to do it by compressing the data.
1 #Coding=utf-82 fromNumPyImport*3 defLoaddataset (filename, delim='\ t'):4FR =open (filename)5Stringarr = [Line.strip (). Split (Delim) forLineinchfr.readlines ()]6Datarr = [Map (float,line) forLineinchStringarr]7 returnMat (Datarr)8 9 defPCA (Datamat, topnfeat=999999):TenMeanvals = Mean (Datamat, axis=0) Onemeanremoved = Datamat-meanvals ACovmat = CoV (meanremoved, rowvar=0) -Eigvals, eigvects =Linalg.eig (Mat (Covmat)) -Eigvalind =Argsort (eigvals) theEigvalind = eigvalind[:-(topnfeat+1): 1] -Redeigvects =Eigvects[:,eigvalind] -Lowddatamat = meanremoved *redeigvects -Reconmat = (Lowddatamat * redeigvects.t) +meanvals + returnLowddatamat, Reconmat - + defMain (): ADatamat = Loaddataset ('TestSet.txt') atLowdmat, Reconmat = PCA (Datamat, 1) - Printshape (Lowdmat) - - if __name__=='__main__': -Main ()
Machine Learning Notes Index
From for notes (Wiz)
PCA Data Dimension Reduction