[Mathematical model] python Implementation of principal component analysis
Def pca (dataMat, topNfeat = 9999999): # data matrix. The top feat meanVals = mean (dataMat, axis = 0) is output) # calculate the average meanRemoved = dataMat-meanVals covMat = cov (meanRemoved, rowvar = 0) # Calculate the covariance matrix eigVals, eigVects = linalg. eig (mat (covMat) # feature value, eigValInd = argsort (eigVals) # Sort To find the largest feature value. in fact, it is the most inconsistent with other changes. eigValInd = eigValInd [:-(topNfeat + 1):-1] # reverse redEigVects = eigVects [:, eigValInd] # lowDDataMat = meanRemoved * redEigVects # ing reconMat = (lowDDataMat * redEigVects. t) + meanVals return lowDDataMat, reconMat
The Mathematical Principle of principal component analysis can be simply understood: Find the biggest change direction as the new feature
If you want to deduce the meaning of the Division from the result of the program to the result, redEigVects is very critical and it provides a ing relationship.