Data Whitening pretreatmentThe "0 mean value" and "spatial solution correlation" of random vectors are the most commonly used two preprocessing processes, in which "0 mean value" is relatively simple, and "spatial solution correlation" involves some knowledge of matrices. A random signal vector with a mean value of zero is provided, and its autocorrelation matrix is
It is obvious that it is a symmetric matrix and is non-negative (all eigenvalue values are greater than or equal to 0). Now, look for a linear transformation pair to transform, that is, make
The meaning of the above formula is: the components of y are irrelevant, i.e.. This process is often referred to as "spatial solution-related", "spatial whitening", or "spherical". is called Spatial solution Correlation matrix (spatial whitening matrix, spherical matrix). By the nature of the known, its existence eigenvalue decomposition:
is an orthogonal matrix, which is a diagonal matrix whose diagonal elements are eigenvalues. Make \BEGIN{EQUATION}\LABEL{EQ:B}B=\SIGMA^{-1/2} q^t\end{equation} has
Therefore, after a linear transformation of the matrix, the individual components become irrelevant. for the purposes of this, eigenvalue decomposition and singular value decomposition are equivalent, and the numerical algorithm of singular value decomposition has better stability than that of eigenvalue decomposition , so we usually use singular value decomposition to construct the spatial solution correlation matrix. It should be noted that "spatial solution correlation" cannot guarantee the "independence" between each component signal, but it can simplify the blind separation algorithm or improve the performance of the separation algorithm. Note: The above from the teacher wearing "blind signal processing" courseware. The most well-known example of is white noise. An element can be a value of a time series at a sequential point in time, and there is no temporal correlation in the noise sequence. The term "white" comes from the fact that the energy spectrum of white noise is a constant on all frequencies, just like a white spectrum with a variety of colors. The essence of whitening is to zoom in and out. The solution correlation matrix of \eqref{eq:b} is definitely not the only whitening matrix. It is easy to see that any matrix (for orthogonal matrices) is also an albino matrix. This is because the right, the following form:
An important example is the matrix. This is also an albino matrix, as it is obtained by using the orthogonal matrix left-multiplicative \eqref{eq:b}. This matrix is called the inverse RMS, and is expressed as it is derived from the standard generalization of the RMS concept to the matrix. Note: The above from the "Independent Component analysis" about the implementation of the whitening code, is actually very simple, the following to a function [z_w varargout] = mywhiten (z)%--------------------------------------------------------------------------% syntax: Z _w = mywhiten (z); % [z_w t] = mywhiten (z); % input: Z is a matrix of MXN, containing the n sampling points of M-dimensional random variables. % output: Z_w is an albino version of Z. T is an albino transformation matrix of MXM. %-------------------------------------------------------------------------- %% calculate sample covariance r = CoV (z '); % 1 means dividing by N to calculate covariance %% whitening Z [U D ~] = SVD (r, ' econ '); % with EIG, [U, d] = eig (R); %% The following whitening matrix t = u * inv (sqrt (D)) * u '; % is called the inverse RMS of the covariance matrix, The INV calculation is not too time consuming because D is a diagonal array. Inv (sqrt (D)) *u ' is also a viable whitening matrix %% times an albino matrix for whitening z_w = t * z; if (nargout == 2) varargout{1} = T; End can also directly utilize Fastica's albino function whitenv, which is to complement its provided PCA function PCAMAT.M Complete Whitening, which provides an example of the following% example% [e, d] = pcamat (vectors); % [nv, wm, dwm] = whitenv (Vectors, E, D ); previously wrote about the use of the Fastica Toolkit, and then gave examples of the use of whitening:
% Test whitenv function CLC Clear close all% load matlab comes with data load cities STDR = STD (ratings); sr = Ratings./repmat (stdr,329,1); sr = SR '; Figure BoxPlot (SR ', ' orientation ', ' horizontal ', ' labels ', categories)% test Firsteig = 1; Lasteig = 9; s_interactive = ' off '; SR = Remmean (SR); The Pcamat and whitenv below do not have a mean value, where the mean processing is done first [E, D] = Pcamat (sr, Firsteig, Lasteig, s_interactive); [NV, WM, DWM] = Whitenv (sr, E, D); Figure BoxPlot (NV ', ' orientation ', ' horizontal ', ' labels ', categories)
The results are as follows
PCA:  PCA has 2 functions, one is the dimensionality reduction (can speed up the training speed of the algorithm, reduce memory consumption, etc.), the first is the visualization of data.  PCA is not a linear regression, because linear regression is to ensure that the resulting function is the least error in terms of y value, while the PCA is guaranteed to be the least error in the resulting function to the reduced dimension. In addition, linear regression predicts the Y value by x value, while the PCA treats all x samples equally. Before using PCA, we need to preprocess the data, first of all, the mean, that is, each feature dimension, minus the average of the dimension, and then the different dimensions of the data range is normalized to the same range, the method is generally divided by the maximum value. It is strange, however, that the average of a natural image is not subtracted from the average value of the dimension, but rather minus the mean of the image itself. Because the pretreatment of PCA is determined according to different application situations. Natural images refer to images that are often seen by the human eye, which conform to certain statistical characteristics. In general, the actual process, as long as the normal camera shot, did not add a lot of artificial images can be called natural pictures, because many algorithms on these images of the input type is still relatively robust. In the natural image to learn, in fact, do not need to pay too much attention to the image of the variance normalization, because the natural image of each part of the statistical characteristics are similar, only need to do a mean value of 0 is OK. However, when training other pictures, such as the first word recognition, we need to make the variance normalization. The calculation process of  PCA mainly requires 2 things, one is the direction of each vector after descending dimension, and the other is the value after the original sample is projected in the new direction. The covariance matrix of the training sample is required first, as shown in the formula (the input data has already been valued): The covariance matrix of the training sample is calculated, and the SVD is decomposed, Each column in the resulting u vector is a new direction vector for these data samples, and the preceding vectors represent the main direction, and so on. What you get with U ' *x is the descending dimension of the sample value Z, which is: (in fact, the geometric meaning of this z-value is the distance value of the original point to that direction, but this distance has positive and negative points), so that the PCA 2 main calculation task has been completed. The original data sample x can be restored with u*z. When using supervised learning, if you want to use PCA for dimensionality reduction, simply extract the x value of the training sample, calculate the principal component matrix U and the value z after descending, and then let the combination of y values of Z and the original sample form a new training sample to train the classifier. During the test, you can also use theThe original u to the new test sample dimensionality, and then input into the trained classifier.
One point to note is that PCA does not stop the fitting phenomenon. It is shown that PCA is dimensionality reduction, because in the same number of training sample data, its characteristic number is less, it should be more difficult to produce overfitting phenomenon. However, during the actual operation, this method prevents the overfitting phenomenon from being very small, mainly through the rule items to prevent overfitting. Not all ML algorithms need to use PCA to reduce dimensionality, because only when the original training sample does not meet the needs of the situation we use, such as the training speed of the model, memory size, hope visualization and so on. If you do not need to consider those situations, you do not necessarily need to use the PCA algorithm. The goal of whitening:whitening is to remove the correlation between the data, which is a process of preprocessing many algorithms. For example, when training picture data, because of the image of the adjacent pixel value has a certain correlation, so a lot of information is redundant. At this time to the relevant operation can be used whitening operation. The whitening of the data must meet two conditions: first, the correlation between different features is the smallest, close to 0, and the variance of all features is equal (not necessarily 1). Common whitening operations include PCA whitening and Zca whitening. PCA whitening refers to the data x after the PCA is reduced to Z, you can see that each dimension in Z is independent, to meet the first condition of whitening whitening, it is only necessary to divide each dimension in Z by the standard deviation to get the variance of each dimension is 1, that is, the variance is equal. The formula is: ZCA whitening refers to the data x first through the PCA transformation to Z, but does not reduce dimension, because here is all the ingredients are selected in. This is also the first condition that satisfies the whtienning, and the characteristics are independent of each other. Then the same operation with the variance of 1, the resulting matrix is left multiply by a eigenvectors matrix u can. ZCA Whitening formula is