Principle and practice of PCA

Source: Internet
Author: User
Tags modulus scalar

In the preprocessing of data, we often encounter the data dimension is very large, if not the corresponding feature processing, then the resource cost of the algorithm is very large, which in many scenarios is unacceptable. However, there is often a large correlation between some dimensions of data, if the data can be processed between the dimensions, so that they retain the maximum data information while reducing the correlation between the dimensions, you can achieve the effect of dimensionality reduction. PCA (principal component analysis) is to use such a concept to map the data into the new dimension space, select the most important components as the base of the new space vector, so in the new coordinate space, the data can retain most of the data information and can achieve the effect of dimensionality reduction. The description of PCA in machine learning is too simplistic, just using code may not give us a detailed understanding of the algorithmic rationale behind it, by learning some blogs, which have a blog http://blog.codinglabs.org/articles/ pca-tutorial.html, this blog is easy to understand, I learned a lot from it, now will learn the knowledge summarized as follows:

General idea:

The number of dimensions in the initial space of the data is determined by the number of characteristics of the data, and we want to map it to a new space in which the number of dimensions is determined by the most important characteristics of the data, and when the data is mapped, the data can be kept as much as possible, and the data is separated more clearly. The concept of the difference and covariance above mathematics is used here, the main theoretical basis of PCA algorithm is based on these concepts to solve the optimization of the step-by-step solution.

Algorithm principle:

Because the principle explained in this blog in http://blog.codinglabs.org/articles/pca-tutorial.html is very good, so the following is based on the principle of this blog to explain, thank Bo Master!

The geometric meanings of the vector inner product and matrix multiplication:

The inner product of two vectors can be seen from the formula and finally a real number is obtained, that is, the inner product operation is to map two vectors to a real number.

So what is the geometrical meaning of the inner product? In fact, it is the vector length that the vector A is projected onto B. Assuming that A and B are two n-dimensional vectors, we know that n-dimensional vectors can be represented equivalently as a directed segment emitted from the origin in n-dimensional space, and for the sake of simplicity we assume that both A and B are two-dimensional vectors, a= (x1,y1),B = (x2,y2). On the two-dimensional plane A and B can be represented by two directed segments originating from the origin, see:

Now we're going to draw a vertical line from point A to position B. We know that the intersection of the perpendicular and the B is called a projection on B, and the angle between A and B is a, the vector length of the projection is | A| Cos (a), where

is the modulus of vector A, which is the scalar length of a segment. Note here that we specifically distinguish between vector length and scalar length, the scalar length is always greater than or equal to 0, the value is the length of the segment, and the vector length may be negative, the absolute value is the segment length, and the symbol depends on its direction is the same or opposite the standard direction. There is also a common formula for vector inner product:

A? B=| A| | B| Cos (a)

If we assume that the modulus of B is 1, let | B|=1, then it becomes:a? B=| A| Cos (a), that is, if the modulus of vector b is 1, then the inner product value of a and B is equal to the vector length of the straight line projection of a to B .

Geometric meanings of matrix operations:

If we determine a set of bases in n-dimensional space, we can use matrix multiplication to make a linear transformation of the vector in that space. Now we have a vector, to determine its location in the space determined by n base vectors, it can be determined by the projection of the vector on each base vector. Usually the base vector is the unit orthogonal basis, so that we can extract a direction of the base vector and the vector do the inner product operation, using the inner product of the previous section of the geometric meaning of the vector on the base vector projection, so you can use matrix form to do the base transformation of the vector. Use an example to illustrate:

In the two-dimensional space, we generally use (0,1) and (1,0) as a group of bases, but at the same time (both) and ( -1,1) can also become a group of bases. In general, we hope that the modulus of the base is 1, because from the meaning of the inner product can be seen, if the base of the modulus is 1, then it is convenient to use the vector point multiply the base and directly get its coordinates on the new base! So the base of its unit can be changed into a and. Now, we want to get the coordinates of (3,2) on the new base, that is, the projected vector values in two directions, then we just have to calculate the inner product of (3,2) and two bases separately according to the geometrical meaning of the inner product, it is not difficult to get the new coordinates. The new base and the coordinate values (3,2) on the new base are given:

Matrix form:

That is, each column vector in the right matrix is transformed into the space represented by the base of the row vectors in the left matrix .

Principle of PCA algorithm:

When we do PCA we want to be able to get the most separation of data in which direction and to keep the data as much as possible. The difference in mathematics can be very good to characterize the degree of dispersion of data, the greater the variance the more separation of data, the more easily we identify. Covariance can well characterize the correlation of data in different dimensions. So when we're working with the data, we're going to mean the data, we'll change the mean to 0, and then multiply the data matrix by the matrix transpose to get the covariance matrix we need. In the covariance matrix, the element on the main diagonal is the variance and the remainder is the covariance.

Now all we have to do is to select the vector direction with the largest variance as the base vector, which is based on the selection of vectors orthogonal to the first base vector and the second largest of the variance as the base vector for the second dimension, and so on. Normally, the base vectors in the R-direction can be maximized to approximate the original data, where R is far less than N, thus achieving a reduced-dimensional effect. Therefore, the whole logic is to optimize the covariance matrix and to use the diagonal operation. Because the covariance matrix is a real symmetric matrix, it is bound to be diagonal, and its characteristic vectors are orthogonal to each other. By the knowledge of linear algebra we can easily get the eigenvalues and eigenvectors of the covariance matrix, the eigenvalues represent the importance of the eigenvector, and the eigenvector indicates the direction of the variance of the data. When the eigenvalues are sorted in descending order, the corresponding feature vectors are formed into the base vectors in the new coordinate space. This will reconstruct the original data into the new coordinate space, and the mapped data can be made according to the meaning of the matrix multiplication.

Here's a proof of why the space vector base we need is the eigenvectors of the covariance matrix. Set the original data matrix x corresponds to the covariance matrix C, and P is a set of base by row matrix, set y=px, then Y is x to P do base transformation data. With the covariance matrix of y as D, we derive the relationship between D and C:

The p we are looking for is not something else, but a p that can be diagonal to the original covariance matrix, which is the eigenvector of the original data covariance. In other words, the optimization target becomes the search for a matrix p, satisfies pcpt is a diagonal matrix, and the diagonal elements are arranged from large to small, then the first r line of P is the base to be searched, and the matrix of P's front R line multiplied by x causes X to descend from N dimension to r dimension and satisfy the above optimization condition.

Machine learning Practical Application Examples:

  1. Convert data to first n principal components

    1 defloaddata (path):2FR =Open (Path)3dataset= []4      forLineinchfr.readlines ():5CurrentData = Line.strip (). Split (' ')6 dataset.append (CurrentData)7Dataarr = [Map (float,linedata) forLinedatainchDataSet]8     returnMat (Dataarr

#PCA
1defPCA (datamat,featurenum):2Datamean = Np.mean (datamat, Axis =0) #求均值3Datamat = Datamat-Datamean #数据去均值化4Covdatamat = Np.cov (datamat,rowvar=0) #求协方差5Featval,featvec =Linalg.eig (covdatamat) #求特征值和特征向量6Featindex =Argsort (featval) #对特征值排序后取前k个特征向量7featindex=list (Reversed (list (featindex)))8Featindex = Featindex[:featurenum:1]9Featvec =Mat (Featvec[:,featindex])TenNewdatamat = Datamat *Featvec #降维 OneOriginmat = (NEWDATAMAT*FEATVEC.T) +Datamean #重构原始数据 A returnNewdatamat,originmat
#画图
1 def PlotData (datamat,originmat): 2 Fig =plt.figure ()3 ax = fig.add_subplot (111)4 ax.scatter (datamat[:,0 ].flatten (). A[0],datamat[:,1].flatten (). a[0],marker='^', s=90)5 ax.scatter (originmat[:,0]. Flatten (). A[0],datamat[:,1].flatten (). a[0],s=50,c='red')6 plt.show ()

2. The semiconductor data provided for the machine learning practice contains 590 characteristics. What we want to do is, how many of these features are the main features that extract these main features. Since many of the missing values in the data are identified by Nan, the first step is to use the mean to replace the data.

1 def Replacenan (datamat): 2     m = shape (Datamat) [1]3for in       Range (m):4         Dataofmean = Mean (Datamat[nonzero (~isNaN (Datamat[:,i])) [0],i])5         Datamat[nonzero (isNaN (Datamat[:,i])) [0],i] = Dataofmean6     return Datamat

A slight change in the PCA program to add a discriminant, if the previous K principal components of the cumulative variance of the total variance is greater than 98% stop, select this k eigenvector:

def PCA (datamat,featurenum): Datamean = Np.mean (datamat, axis = 0) Datamat = Datamat-datamean Covdatamat = np.c OV (datamat,rowvar=0) Featval,featvec = Linalg.eig (covdatamat) Featindex = Argsort (featval) featindex=list (reverse D (List (Featindex))) Featindex = featindex[:featurenum:1] Sumoffeatval = SUM (featval) addtotal = 0.0 for k in R Ange (featurenum): AddTotal +=featval[featindex[k]] percent= addtotal/sumoffeatval if percent>0.9            8:print ' The number of%d have occupied%f '% (k, percent) print ' The best number of feature is ', K Break print ' The number of%d have occupied%f '% (k,percent) FIG = plt.figure () ax = Fig.add_subplot    (111) Percentfeat = featval[featindex][0:20]/sumoffeatval x = Arange (a) ax.plot (x, percentfeat, ' o ', c= ' R ') Plt.xlabel (' The number of princple feature ') Plt.ylabel (' var percent (%) ') Plt.grid () plt.show () Featvec = ma T (Featvec[:,featindex])   Newdatamat = Datamat * Featvec Originmat = (newdatamat*featvec.t) +datamean return Newdatamat,originmat,featval 

The final statistical results:

The number of 0 has occupied 0.5925411 have occupied 0.8337792 has occupied 0.9252793 have Occupi  Ed 0.9482854 has occupied 0.9628775 have occupied 0.9680656 has occupied 0.9712917 have Occupied 0.9744388 has occupied 0.9770699 have occupied 0.979382Ten has occupied 0.981557 /c10>is  10

Reference:

1. Blog http://blog.codinglabs.org/articles/pca-tutorial.html

2. "Machine Learning Combat"

Principle and practice of PCA

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.