PCA Data Dimension Reduction

Source: Internet
Author: User

Principal Component Analysis Algorithm advantages and disadvantages:
    • Pros: Reduce data complexity and identify the most important features
    • Cons: Not necessarily required, and may lose useful information
    • Applicable data type: numeric data
Algorithmic thinking: The benefits of dimensionality reduction:
    • Make data sets easier to use
    • Reduce the computational overhead of many algorithms
    • Noise removal
    • Make the results understandable

The idea of principal component analysis (principal component ANALYSIS,PCA) is to transform the data into a new coordinate system, the choice of which is determined by the data itself, the first dimension is the largest direction of the difference between the original data, the second is the first Korimasa with the largest variance, has been repeated ...
The covariance matrix and eigenvalue analysis of datasets are used in principal component analysis.

Function:

pca(dataMat, topNfeat=999999)
Because principal component analysis can basically be said to be a matrix problem, and numpy in this area to help us do, so the function is very simple, is to go to the average, and then calculate the covariance matrix and its eigenvalues, and finally select the largest topnfeat, and finally use these eigenvectors to transfer the source data to a new space. Of course, there are two ways to use it, one is to limit the number and the other is to do it by compressing the data.

  1. 1 #Coding=utf-82  fromNumPyImport*3 defLoaddataset (filename, delim='\ t'):4FR =open (filename)5Stringarr = [Line.strip (). Split (Delim) forLineinchfr.readlines ()]6Datarr = [Map (float,line) forLineinchStringarr]7     returnMat (Datarr)8     9 defPCA (Datamat, topnfeat=999999):TenMeanvals = Mean (Datamat, axis=0) Onemeanremoved = Datamat-meanvals ACovmat = CoV (meanremoved, rowvar=0) -Eigvals, eigvects =Linalg.eig (Mat (Covmat)) -Eigvalind =Argsort (eigvals) theEigvalind = eigvalind[:-(topnfeat+1): 1] -Redeigvects =Eigvects[:,eigvalind] -Lowddatamat = meanremoved *redeigvects -Reconmat = (Lowddatamat * redeigvects.t) +meanvals +     returnLowddatamat, Reconmat -      + defMain (): ADatamat = Loaddataset ('TestSet.txt') atLowdmat, Reconmat = PCA (Datamat, 1) -     Printshape (Lowdmat) -      - if __name__=='__main__': -Main ()

Machine Learning Notes Index



From for notes (Wiz)



PCA Data Dimension Reduction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.