The main content of this article is from Andrew's book, linked to http://ufldl.stanford.edu/tutorial/unsupervised/PCAWhitening/ PCA
PCA, also known as principal component analysis, is a means of dimensionality reduction, which can significantly improve the speed of the algorithm.
When you are working with an image, the input is usually redundant because the adjacent pixels in the image are often associated, and the PCA can be used to approximate the original input of the image to a lower dimension input and to ensure that the error is small.
A visual understanding of dimensionality is shown in the following figure:
The data in the original image is two-dimensional, but it is obvious that the data is approximate to some kind of linear structure. PCA can project data onto a one-dimensional subspace.
Define the matrix:
σ=1m∑i=1m (x (i)) (x (i)) T. \begin{align} \sigma = \frac{1}{m} \sum_{i=1}^m (x^{(i)}) (X^{(i)}) ^t. \end{align}
If the mean value of x is 0, then σ is the covariance matrix of X.
The characteristic vectors of the memory σ are U1 and u2,u1 corresponding eigenvalues are larger
Http://cs229.stanford.edu/notes/cs229-notes10.pdf has specific derivation
For an input x with an equal 0 variance on each dimension, Σ is computed first, and then its eigenvectors are stored in the matrix U, and it is important to note that U is an orthogonal array.
We can rotate the original vector x to get
XROT=UTX=[UT1XUT2X] \begin{align} x_{\rm Rot} = U^tx = \begin{bmatrix} u_1^tx \ U_2^tx \end{bmatrix} \end{align}
If we want to reduce X to K dimensions, we select the first k elements of Xrot.
We can also use the matrix U to restore the reduced-dimension data.
So, how do we determine how many elements we should keep? If k is too large, then the degree of dimensionality is very small, if K is too small, then you may lose a lot of detail. Here's an indicator
∑kj=1λj∑nj=1λj. \begin{align} \frac{\sum_{j=1}^k \lambda_j}{\sum_{j=1}^n \lambda_j}. \end{align}
Used to indicate the degree of variance of a reservation.
For images, 99% is usually retained and sometimes 90-98% is retained.
PCA on the image
If you are using PCA on other applications, you may need to process each feature separately, treat the mean of each feature as 0, and the variance as the unit value, but usually not on the image.
For natural images, it is not significant to estimate the mean and variance individually for each pixel, because the statistical characteristics of the part of the image are usually similar to the other parts (stationarity).
In order for PCA to work well, we need two things: the characteristic mean is approximately 0 different characteristic variance is similar to other characteristics
For images, even if we do not make variance normalization, the second one can be satisfied. Therefore, we do not usually do variance normalization (tone frequency spectrum, text (BOW) also do not do)
In fact, the eigenvector returned by PCA does not change with the scaling of the data.
What we need to do is to have the mean normalized,
It is important to note that for each image, we need to do the above two steps separately.
If you are dealing with other pictures (unnatural pictures, such as hand writing, there is a single object inside the white, etc.), you may want to use other normalization whitening
The purpose of albinism is that the characteristics of each other are associated with very small characteristics with the same variance
We previously calculated that the relationship between features was small.
The resulting covariance matrix is
A diagonal two value is actually a characteristic value.
To make the variance of a feature a unit value, we can scale the feature to get the value after whitening
That is, the covariance matrix is the unit matrix I.
If you want data whitening and the dimension is smaller than the original dimension, then you can keep only the first k elements. When we combine PCA whitening and regularization, the last few elements are close to 0 and can be removed. ZCA Whitening
In fact, this method of making the data covariance matrix I is not unique, if R is an orthogonal array, that is, the covariance matrix is also I,
In Zca albinism, we choose R=u, which defines
For all possible R, this rotation allows the xzcawhite to be as close as possible to the original input data x
When using Zca whitening, we usually keep all n dimensions of the data, not dimensionality.
In the actual use of PCA or ZCA, there are some eigenvalue λ near 0, and we need to divide in normalization, will make the data unstable. Therefore, we usually do a little regularization, that is, to take a very small constant ε, usually
Xpcawhite,i=