Why the ICA on UFLDL must do PCA whiten
Mr. Andrew Ng's UFLDL tutorial is a preferred course for deep learning beginners. Two years ago, when I looked at the ICA section of the tutorial, I mentioned that when using the ICA model described in the tutorial, the input data had to be PCA-whitening, and a todo on the page asked why. My understanding of machine learning at that time did not answer this question, just follow the tutorial to write the code, and then never read it.
Today, when discussing several loss functions of unsupervised learning, we refer to the loss function of PCA:
Max∥Wx∥2s. T. WwT=I,
By optimizing this variance, we can get the solution of maximizing the variance. whichWis a flat matrix, to play the role of dimensionality reduction. As we all know, this optimization can be solved by SVD:w=< Span id= "mathjax-span-29" class= "Mi" >us v , take u in the first few columns, you can get the rotation matrix for dimensionality reduction, And the last few singular values correspond to u The last few columns, which will generally be close to 0 of the number, It's going to be lost.
At this point, I suddenly remembered why the loss function of ICA uses the Min function:
min∥Wx∥1s. T. WwT=I.
Want to know? 1 Norm and? 2 The norm difference does exist, but it is not so obvious that if you use the Min function, do you find the subspace that has no data?
So I went through the tutorial, originally here need to firstxDo PCA whitening, first take thexA few dimensional subspace with a large variance, and then minimize the operation on this? 1 norm in exchange for ? 2 norm, which is actually the middle column of U .
Why the ICA on UFLDL must do PCA whiten