I. K-L transformation
If you say PCA, you must first introduce the K-L transformation.
The K-L transformation is the abbreviation of Karhunen-loeve transformation and is a special orthogonal transformation. It is based on statistical characteristics of a transformation, and some of the literature is called hotelling (Hotelling) transformation, because he first in 1933 to transform discrete signals into a series of unrelated coefficients of the method.
The outstanding advantage of K-L transformation is that it can go correlation, and it is the best transformation under the meaning of Mean square error (Mean square Error,mse).
Here is a simple introduction to the K-L transformation.
X∈RN (n-order vector), whose mean vector is MX, its covariance matrix can be expressed as
cx= e{(X-MX) * (X-MX) T} (2.1)
CX is a real symmetric array of n*n orders.
The K-L transformation defines an orthogonal transformation a∈rn*n, which maps x∈rn vectors to vectors represented by y∈rn, and makes the components of the y vector irrelevant:
Y = A * (X-MX) (2.2)
Because there is no correlation between the components of Y, the covariance matrix CY is the diagonal array, i.e.
Cy = diag (λ1,λ2,..., λn)
Matrix A can always be found, because for real symmetric matrices, an orthogonal array A is always found, which results in a symmetric matrix for the Acxat operation. In the K-L transformation, each row of a is taken as a feature vector of CX, and these eigenvectors are sorted in descending order of the corresponding eigenvalues, so that the maximum eigenvalues correspond to the eigenvectors in the first row of a, and the minimum eigenvalues correspond to the last line of a. CY is the result of CX diagonalization, so the eigenvalues of the two matrices are consistent (λ1,λ2,..., λn).
In this way, the K-L transformation from the random vector x to the random vector y can be realized by the matrix A, and the
X = ATY +MX (2.3)
You can implement the Y inverse transformation to x.
If the selected maximum K eigenvalues correspond to the K-eigenvectors, which make up the transformation matrix A of KXN, then the transformation y is reduced to k-dimensional, then the recovery formula for X by Y is as follows:
X ' = Aky +mx (2.4)
At this time cy = diag (λ1,λ2,..., λk), the mean square error between x and X. can be expressed by the following formula:
Λk+1+.λk+2...+λn (2.5) (No Formula editor AH)
Above we mentioned that for the eigenvalues λ is from large to small sort, then this time through the equation 2.5 can be shown by selecting K has the largest eigenvalue of the eigenvector to reduce the error. Therefore, the K-L transformation is the best transformation, from minimizing the mean square error between vector x and its approximate x '.
Two. PCA, principal component analysis
In the early 1990s, Kirby and Sirovich began to discuss the optimal representation of human face images using PCA technology. And this technique is used by M.turk and a.pentland in face recognition, and is called feature face method. M.turk and A.pentland the face images of MXN, rearranging them into the column vectors of M *n. Then all the training images are transformed by this transform to get a set of column vectors: {XI},xi∈rm*n, where n represents the number of images in the training sample set. The image is considered as a random column vector, and the mean vector and covariance matrix are estimated by training samples.
The mean vector μ is estimated by the following formula:
μ= (1/n) * (X1+X2+...+XN) (3.1)
Covariance matrix
ST = e{(xi-u) * (xi-u) T} = X ' X ' t (3.2)
where x ' = [x1-μ, x2-μ,...., xn-μ]
The projection transformation matrix A is taken as the characteristic vector corresponding to the first K maximum eigenvalue of St. The original image is correlated and reduced by K-L Transformation:
Y = ak* (x-mx) (3.3)
Because St =x ' x ' T, and X ' Is (m*n) *n matrix, but because X ' is the n-order matrix, so the rank of St is the largest of N-1, so as long as the characteristic vector of St can calculate the K-L transformation matrix.
But since St is a matrix of (m*n) * (m*n) Order, it is more complex to compute its eigenvector, and here is a technique used:
Xtxvi=δivi (3.4)
(XXT) (xvi) =δi (xvi) (3.5)
According to the equation 3.4 and 3.5 can be seen, as long as the XTX eigenvalues and eigenvectors Δil and VI can be calculated, and then the Xxt eigenvalues and eigenvectors Δil and XVI, and XTX for the n*n order matrix, the calculation is relatively easy, in addition, can also use SVD , it is not mentioned here.
Three. PCA Process Finishing
The whole transformation process of PCA was collated, as follows:
1. Rearrange the training image of MXN into a column vector of M *n. Calculates the mean vector and centers all the samples using the mean vector.
2. Using the centered sample vector, the covariance matrix is computed according to the formula (3.2), its eigenvalue is decomposed, and the eigenvector is arranged in descending order according to its corresponding eigenvalue value.
3. Select the k≤n-1 maximum eigenvalue corresponding to the 2nd step of the characteristic vector composed of a projection matrix A, each of the center of the training image (X1-μ, x2-μ,...., xn-μ), to the matrix a projection, each training image of the reduced dimension is expressed as (y1-μ, y2-μ,...., YN)
4. Centering the test image and projecting it into matrix A, the reduced dimension representation of the test image is obtained.
5. Select the appropriate classifier to classify the test image.
Reference: PCA (Principal Component Analysis) introduces PCA Essence and SVD
K-L transformation and principal component analysis PCA