PCA algorithm understanding and code implementation

Source: Internet
Author: User

GITHUB:PCA code implementation, PCA application
This algorithm is implemented using Python3

1. Data Dimension Reduction

?? In the actual production life, we obtain the data set in the characteristic often has the very high dimension, the high dimension data processing time to consume is very big, and too many characteristic variable also can hinder the establishment of the Discovery law. We need to solve the problem of how to reduce the data dimension to the maximum extent of preserving the information of the data set .
?? dimensionality reduction has the following advantages :
?? (1) Make data sets easier to use
?? (2) Reduce the computational overhead of many algorithms
?? (3) Noise removal
?? (4) Make the results easy to understand
?? As a part of data preprocessing, dimensionality reduction technology can be used in supervised learning as well as unsupervised learning. and the dimensionality reduction technology mainly has the following several kinds: principal component Analysis (Principal Component ANALYSIS,PCA), factor analysis (Factor analyses), and Independent component analysis ( Independent Component Analysis, ICA). PCA is the most widely used principal component Analysis and the PCA is introduced in detail.

2. Principal component Analysis (PCA)

?? We use an example to understand how PCA is dimensionality-reduced.
?? For reference, for sample data $ d=\lbrace x^{(1)},x^{(2)},..., x^{(M)} \rbrace$, where $ x^{(i)} = [X^{(i)}_1, x^{(i)}_2]^t $



?? We need to reduce the sample data from two dimensions to one dimension . That is, $ x^{(i)} \to z^{(i)}, i=1,2,..., M $, such as:



?? In general, these sample points can be projected into one dimension on any vector in the map, so how do we choose the best projection vectors ? In the 1th section we mention the need to minimize the data dimension by preserving the amount of information in the dataset, so we need to have an optimization objective to select the vectors in the graph.
?? and the optimization goal of PCA is:
?? (1) for $2 $ dimension to $1 $ dimension: Find a projection direction that makes the projection error and minimum .
?? (2) for $ N $ dimension to $ K $ dimension: Find $ K $ for a vector definition of $ K $ dimension projection plane, making the projection error and minimum .
?? So what is the projection error ? The projection error is the distance from each sample point to the projection vector or projection plane. and the projection error is the same as the distance of all sample points to the projection vector or projection plane.
?? Why should the "projection error and minimum" be the most optimized target ?
?? We explain this in the following example. The following two images show two different projection vectors:

?? We can and clearly see that the projection error for the first vector is smaller than the projection error of the second type.
?? The projection result corresponding to the above two graphs:

?? Suppose that for the original sample, three sample points in the first quadrant belonged to category "A", and two sample points in the third quadrant belonged to category "B". After projection, we can see that for the first dimensionality reduction, the three samples on the right side of the origin are still maintained after the dimensionality is category "A", and the two samples on the left side of the origin belong to category "B". For the second projection, it is clear that the category of the sample cannot be distinguished. In other words, the second projection method loses some of the information.
?? Therefore, "projection error and minimum" becomes the goal that we need to optimize.
?? So how do you find the direction of the smallest projection error?
?? find the direction of the most variance . The variance is the largest and the projection error is the least of the two optimization objectives are essentially the same, specific reference to Zhou Zhihua, "machine learning", the book shows that the maximum scalability (maximum error) and the recent refactoring (projection and minimum) two optimization equivalence.
?? In this way, we have a preliminary understanding of the dimensionality reduction method of PCA.

3. PCA algorithm ideas and Essentials 3.1 PCA algorithm thinking

?? The main idea of PCA is that the data is converted from the original coordinate system to the new coordinate system, which is determined by the data itself. When you convert a coordinate system, the direction of the axis is the largest of the variance, because the maximum variance of the data gives the most important information about the data. The first new axis selects the direction in which the original data is the most bad, and the second new axis selects the direction that is orthogonal to the first new axis and the variance is large. Repeat the process with the number of repetitions as the feature dimension of the original data.
?? With the new coordinate system obtained in this way, we find that most of the variance is contained in the previous axes, and the subsequent axes contain a variance of almost 0. Thus, we can ignore the remaining axes, leaving only the previous axes with most of the variance. In fact, this is equivalent to preserving only the dimension features that contain most of the variance, while ignoring the feature dimension with the variance of almost 0, it also realizes the dimensionality reduction of the data characteristics.

Key points of 3.2 PCA algorithm

?? According to the above method of PCA, we can probably see that the main points of PCA algorithm areHow to find the most variance direction
?? Some of the things in linear algebra are mentioned here:
?? (1)Covariance matrix
???? (1.1) features $ x_i $ with features $ X_J $covariance (covariance):\[Cov (x_i, X_j) = \frac{\sum_{k=1}^n (x_i^{(k)}-\overline{x}_i) (x_j^{(k)}-\overline{x}_j)}{n-1} \]
?????? where $ x_i^{(k)}, x_j^{(k)} $ represents the value of $ X_i.x_j $ in the $ K $ sample. The $ \overline{x}_i,\overline{x}_j $ is the sample mean that represents two features.
?????? Can be seen,when $ x_i = X_j $, the covariance is the variance
???? (1.2) For a sample of only two characters,Covariance matrixFor:\[C = \begin{bmatrix} Cov (x_1,x_1) & Cov (x_1,x_2) \ Cov (x_2,x_1) & Cov (x_2,x_2) \ \end{bmatrix} \]
?????? When the feature number is $ N $, the covariance matrix is $ n \times N $ dimension of the matrix, and the diagonal is the variance value for each feature.
?? (2)feature vectors and eigenvalue values
???? For a matrix of $ A $, if you meet $ A \zeta = \lambda \zeta $, then the $ \zeta $ is the matrix $ A $feature Vectors, while $ \LAMBDA $ is the matrix $ A $characteristic value。 Follow the eigenvaluesfrom big to smalland select the first $ k for the value of thefeature Vectorsis the projection vector.
???? For the solution of eigenvalues and eigenvectors, the main feature is decomposition (when $ A $ is a phalanx), singular value SVD decomposition (when $ A $ is not a phalanx)

4. PCA algorithm Process

?? input : Training Sample Set $ D ={x^{(1)},x^{(2)},..., x^{(M)}} $, low dimensional dimension $ D ' $;
?? process :.
?? 1: Centering of all samples (mean operation): $ x_j^{(i)} \leftarrow x_j^{(i)}-\frac{1}{m} \sum_{i=1}^m x_j^{(i)} $;
?? 2: Calculate the covariance matrix of the sample $ XX^T $;
?? 3: Do eigenvalue decomposition on covariance matrix $ xx^t $;
?? 4: Take maximum $ D ' $ for eigenvalues corresponding to eigenvectors $ w_1,w_2,..., w_{d '} $
?? 5: Multiply the original sample matrix with the projection matrix: $ x \cdot W $ is the reduced dimension after the dataset $ X ' $. Where $ X $ is $ m \times N $ dimension, $ W = [w_1,w_2,..., w_{d '}] $ for $ n \times d ' $ dimension.
?? 5: Output: Reduced-dimension Data set $ X ' $

5. PCA Algorithm Analysis

?? Pros : Makes data easier to use and removes noise from the data, making other machine learning tasks more accurate. This algorithm is often used as a preprocessing step to cleanse data before it is applied to other algorithms.
?? disadvantage : the reduction of the data dimension does not mean that the feature is reduced, because the dimensionality still retains a large amount of information, and the problem of the result overfitting is not helpful. The dimensionality reduction algorithm cannot be used as a solution to the overfitting problem. If the original Data feature dimension is not very large, there is no need for dimensionality reduction.

References and references:
[1] "machine learning" Zhou Zhihua
[2] "machine learning combat" Peter Harrington
[3] Https://www.cnblogs.com/zy230530/p/7074215.html
[4] Http://www.cnblogs.com/zhangchaoyang/articles/2222048.html
[5] Https://www.cnblogs.com/terencezhou/p/6235974.html

Written in the end: This article refer to the above information for integration and summary, belong to the original, the article may appear in the wrong understanding of the place, if there are views or objections can be commented below, thank you!
If you need reprint please specify : https://www.cnblogs.com/lliuye/p/9156763.html

PCA algorithm understanding and code implementation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.