Principal Component Analysis (PCA) Principle Analysis

Source: Internet
Author: User

Currently, the PCA algorithm is widely used in image processing. When the feature dimension of the extracted image is relatively high, in order to simplify the calculation and storage space, the high-dimensional data needs to be reduced to a certain extent, and the data is not distorted as much as possible.

 

Let's give an example to make it easy to understand:

1) for a training set, 100 samples (I =, 3 ,..., 100), feature Xi is 20 dimensions. [xi1, xi2, xi3 ,... xij ,..., xi20] (j = 1, 2 ,.., 20), then it can create a 20*100 sample matrix m.

2) Then we start to find the covariance matrix of this sample and get a covariance matrix of 20*20. The calculation process is as follows:

• Calculate the average XAV = (Σ XI)/20 of Xi first;

• For each Xi, calculate the Xi-XAV, that is, the MI (row I) is changed to Mi-XAV, which is recorded as Mn;

• The Covariance Matrix Z is Mn * Mn ('indicates transpose ).

3) then obtain the feature values and feature vectors of the z20x20 covariance matrix. Generally, there should be 20 feature values and feature vectors. Now, based on the feature values, take out a large feature value and its corresponding feature vector (assuming that the extracted feature value is a large five feature values ), then these five feature vectors form a 20*5 matrix V, which is the feature matrix we require.

4) Multiply the value of Mn by v to obtain a base matrix (*) with a size of 100x5.

5) Take a sample 1x100 and multiply the 100x5 feature matrix to get a new 1X5 sample. Obviously, the dimension of each sample decreases, then we use this 1X5 vector to compare similarity.

 

Note:

› 3) in the process, the feature value can be selected based on the sum of n feature values greater than 90% of the total while the number of dimensions to be reduced is unknown.

› The solution to the base matrix at the preceding (*) is not unique and can be corrected by yourself.

 

After roughly speaking the PCA dimensionality reduction process, someone may ask why only a few feature values with a large feature value can be extracted to replace the original sample matrix.

Okay, let's not talk much. The following describes the mathematical significance of the matrix feature values and feature vectors:

 

For the sake of simplicity, take the two-dimensional matrix A = [1 0; 0-1] (the rank of the matrix is 2) as an example, A point on the plane (x, y) after A transformation, it becomes (x', y'). If these two points are in A straight line, it can be understood that the role of matrix A happens to make the vector [x y] 'only transforms the length in the original direction, that is, Ax = λ x (x is A column vector ). for matrix A, it is easy to obtain the two feature values of A and the corresponding feature vectors λ 1 = 1, e1 = [1 0] ', λ 2 =-1, e2 = [0-1] ', any point on the two-dimensional plane (x, y) = b1 * (1, 0) + b2 * (0,-1) (b1, b2 is A real constant); then A [x y] '= A * (b1 * e1 + b2 * e2) = b1 * λ 1 + b2 * λ 2 = Σ bi λ I;

 

This formula is extended to a high-dimensional space and can be ignored for feature dimensions with small λ values in calculation (x', y.

B = [1 0; 0 0.01], where the two feature values of B and the corresponding feature vectors λ 1 = 1, e1 = [1 0] ', λ 2 = 0.01, e2 = [0 1]'

Then x = [2 3] 'is converted to Bx = [2 0.03]' After B;

If we assume that λ 2 is much smaller than λ 1 and ignore λ 2, Bn = [1 0; 0 0], Bnx = [2 0] '≈ [2 0.03].

 

In layman's terms, the pca algorithm is used to search for dimensions with relatively large variance in the dimension, while ignoring the average dimension. If the first element of the preceding X feature vector is 1, the data in this column can be ignored, because it cannot be used for distinguishing, on the contrary, we need to find and extract the dimensions with a wide distribution in a certain dimension.

For example, in a plane area, a 75-degree oblique ellipse and the long axis is much larger than the short axis, the distribution of points on the short axis is obviously weaker than that on the long axis. When the short axis is far less than the long axis, if it is a straight line, the dimension of the short axis is lost.

The following is a matlab-Based pca algorithm code:

% A modified PCA for face recognition matlab code clear; % calc xmean, Sigma and its eigen decompositionallsamples = []; % All training images for I = for J = A = imread (strcat ('C: \ Documents and Settings \ foreigners \ Desktop \ orl \ s', num2str (I ), '\', num2str(j),'.bmp '); % imshow (a); B = a (* 92); % B is the row vector 1 × n, where n = 10304, the extraction sequence is to first run the column, that is, from top to bottom, from left to right B = double (B); allsamples = [allsamples; B]; % allsamples is an M * n matrix, each row of data in allsamples represents an image, where M = 200 endendsamplem EAN = mean (allsamples); % average image, 1 × Nfor I = xmean (I, :) = allsamples (I, :)-samplemean; % xmean is an m × n matrix, the data stored in each xmean row is "data of each image-average image" end; Sigma = xmean * xmean "; % m * m-level matrix [v d] = EIG (SIGMA ); d1 = diag (d); [D2 Index] = sort (D1); % in ascending order Cols = size (v, 2 ); % Number of columns in the feature vector matrix for I = 1: Cols vsort (:, I) = V (:, index (Cols-I + 1 )); % vsort is an M * COL (Note: Col is generally equal to m) level matrix. It stores feature vectors in descending order. Each column forms a feature vector dsort (I) = D1 (index (Cols-I + 1); % saved by dsort Is the feature value in descending order, which is the one-dimensional row vector end % to complete the descending order % select 90% of the Energy dsum = sum (dsort); dsum_extract = 0; P = 0; while (dsum_extract/dsum <0.90) P = p + 1; dsum_extract = sum (dsort (1: p); endi = 1; % (training phase) calculate the coordinate system formed by the feature face. The while (I <= P & dsort (I)> 0) base (:, I) = dsort (I) ^ (-1/2) * xmean '* vsort (:, I); % base is a matrix of n × P order. Dividing by dsort (I) ^ (1/2) is a standard for face images, for details, see PCA-Based Face Recognition Algorithm Research P31 I = I + 1; end % add by wolfsky is the following two lines of code that project the training sample to the coordinate system, obtain an M * P-level matrix allcoora. Llcoor = allsamples * base; ACCU = 0; % test process for I = for J = % read 40x5 test images a = imread (strcat ('C: \ Documents and Settings \ foreigners \ Desktop \ orl \ s ', num2str (I),' \ ', num2str(jfolder, '.bmp'); B = a (); B = double (B ); tcoor = B * base; % calculates the coordinate, which is a 1× P Matrix for k = mdist (K) = norm (tcoor-allcoor (K, :)); end; % third-order nearest neighbor [Dist, index2] = sort (mdist); class1 = floor (index2 (1)/5) + 1; class2 = floor (index2 (2)/5) + 1; class3 = floor (INDE X2 (3)/5) + 1; % class = class1; % blue_lg if class1 ~ = Class2 & class2 ~ = Class3 class = "class1"; elseif class1 = class2 class = "class1"; % elseif class2 = class3 % class = "class2"; end; if class = I ACCU = ACCU + 1; end; Accuracy = ACCU/200% output recognition rate % zuobiao = [1:100]; % plot (zuobiao, accuracy );

  

Reprinted Please note: http://www.cnblogs.com/blue-lg/archive/2012/05/14/2499581.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.