Independent component Analysis Contents [hide] 1 Overview 2 standard orthogonal ICA 3 topology ICA 4 Chinese-English translator overview
Try to recall that in the introduction of sparse coding algorithms we want to learn a super complete base (Over-complete basis) for sample data. Specifically, this means that the base vectors learned with sparse encoding are not necessarily linearly independent. Although in some cases this has met the need, sometimes we still want to get a group of linear independent radicals. The Independent Component Analysis (ICA) algorithm is achieving this. Moreover, in ICA, we want to learn that the base is not only linear independent, but also a set of standard orthogonal basis. (a set of standard orthogonal bases needs to satisfy the conditions: (if) or (If i = j))
Similar to sparse coding algorithms, independent component analysis also has a simple mathematical form. Given the data x, we want to learn to get a set of base vectors-matrix W in the form of column vectors, it satisfies the following characteristics: first, as with sparse coding, features are sparse; second, the base is standard orthogonal (note that in sparse coding, matrix A is used to map feature s to raw data, whereas in independent component analysis, The Matrix W works in the opposite direction, which is to map the original data x to the feature. So we get the following objective function:
Since Wx is actually a feature that describes the sample data, this objective function is equivalent to the sparse penalty for feature s in sparse coding. After adding standard orthogonal constraints, independent component analysis is equivalent to solving the following optimization problems:
As with the usual situation in deep learning, there is no simple analytic solution to this problem. What's worse, because of the standard orthogonality constraint, it becomes more difficult to solve the problem with the gradient descent method-after each gradient descent iteration, the new base must be mapped back into the orthogonal base space (to guarantee the orthogonality constraint).
In practice, it is feasible to apply orthogonal constraints at the same time as the optimal objective function (as mentioned in the orthogonal ICA), but it is slow. When standard orthogonal bases are indispensable, the use of standard orthogonal ICA is subject to some limitations. (See: TODO)
Standard orthogonal ICA
The objective functions of the standard orthogonal ICA are:
By observation, the constraint WWT = I implies two other constraints:
First, because you want to learn a set of standard orthogonal bases, the number of base vectors must be less than the dimensions of the input data. Specifically, this means that you cannot learn to get a super complete base (Over-complete bases) as you normally do in sparse coding.
Second, the data must undergo ZCA albinism (i.e., ε is set to 0). (Why do you have to do this?) See TODO)
Therefore, before optimizing the standard orthogonal ICA objective function, it is necessary to ensure that the data has been whitened and that a set of incomplete bases (Under-complete basis) is being studied.
Then, in order to optimize the objective function, we can use the gradient descent method to increase the projection steps in each step of the gradient descent to satisfy the standard orthogonal constraints. The process is as follows:
Repeat the following steps until complete: where U is the matrix space satisfying wwt = i
In practice, the learning rate α is variable, and a linear search algorithm is used to accelerate the gradient. The projection step is done by setting, which can actually be seen as Zca albinism (TODO: Explain why this is like Zca albinism).
Topology ICA
Similar to the sparse coding algorithm, coupled with a topological cost term, the independent component analysis method can be modified into a topological algorithm.
Independent Component Analysis Sparse coding algorithm Sparse coding Super Complete base over-complete basis standard orthogonal base orthonormal basis sparse penalty Sp Arsity penalty gradient descent method gradient descent albinism whitened incomplete base under-complete basis line Search algorithm Line-search
Topology cost item topographic term from:http://ufldl.stanford.edu/wiki/index.php/%e7%8b%ac%e7%ab%8b%e6%88%90%e5%88%86% E5%88%86%e6%9e%90