ICA extension Description7. ICA algorithm Extension Description
The above-mentioned content is basically a handout, and I read another article, "Independent Component Analysis:
Algorithms and Applications "(Aapo Hyvärinen and Erkki Oja) is a bit of a discrepancy. Here's a summary of some of the things that are mentioned in this article (some of which I don't understand).
First there was a reference to a concept "unrelated" (uncorrelated) similar to "independent". uncorrelated part of independence, not completely independent, how to portray it?
If the random variable and is independent, if and only if.
If the random variable is not relevant, and only if the
The second unrelated condition is "loose" than the first independent condition. Because the independence can be launched irrelevant, irrelevant cannot push out independent.
The proof is as follows:
In turn cannot be rolled out.
For example, the joint distribution of the and is as follows (0,1), (0,-1), (1,0), ( -1,0).
So and irrelevant, but
Therefore and does not satisfy the above integral formula, and is not independent.
As mentioned above, if the Gaussian distribution, a is orthogonal, then it is Gaussian distribution, and is independent. It is not possible to determine a because any orthogonal transformation can achieve the same distribution effect. However, if only one of the components is Gaussian distributed, ICA can still be used.
So the problem that ICA is going to solve is: how to eject S from X so that s is the most unlikely to satisfy the Gaussian distribution?
The central limit theorem tells us that the sum of a large number of independent and distributed random variables satisfies the Gaussian distribution.
What we've been assuming is that the independent distribution of the main element is generated by the mixed matrix A. So to ask, we need to calculate each component. Definition, then, the reason why it is so troublesome to define Z is to illustrate a relationship, and we want to make a linear combination of the entire one to derive Y. And we don't know if Y is the true component of S, but we know that Y is a linear combination of the true component of S. Since we cannot make the component of s a Gaussian distribution, our goal is to make Y (i.e.) the least likely to be the W at Gaussian distribution.
Then the question is recursive to how to measure whether Y is a Gaussian distribution.
One measure is the Kurtosis method, with the following formula:
If Y is a Gaussian distribution, then the function has a value of 0, otherwise the value is not 0 in most cases.
But this kind of measurement method is not good, there are many problems. Take a look at the next method:
Negative entropy (Negentropy) measurement method.
We know in the information theory that for discrete random variable y, the entropy is
Continuous value is
In the theory of information, there is a strong conclusion that the Gaussian distribution of random variables is the same variance distribution of the largest entropy. That is, for a random variable, it is most random to satisfy the Gaussian distribution.
The formula for defining the negative entropy is as follows:
That is, the entropy difference of the random variable y relative to the Gaussian distribution, the problem of this formula is that the direct computation is more complicated, and the approximation strategy is generally adopted.
This approximation strategy is not good enough, the author proposes a more optimal formula based on maximum entropy:
The subsequent Fastica is based on this formula.
Another measure is the minimum mutual information method:
This formula can be interpreted in this way, the previous h is the encoding length (understood in the information encoding), and the second h is the average encoding length when y becomes a random variable. After the content including Fastica no longer introduced, I did not read.
8. ICA's Projection tracking interpretation (Projection Pursuit)
The meaning of projection tracking in statistics is to look for "interesting" projections of multidimensional data. These projections can be used in data visualization, density estimation, and regression. For example, in one-dimensional projection tracking, we look for a straight line so that all data points are projected onto a straight line to reflect the distribution of the data. The last thing we want, however, is Gaussian distribution, which is the most interesting of the data points with Gaussian distributions. This with our ICA thought is always, looking for independence of the most unlikely is the Gaussian distribution of S.
In, the main element is the longitudinal axis, with the largest variance, but the interesting is the horizontal axis, because it can separate two classes (signal separation).
9. Pre-processing steps of the ICA algorithm
1, centering: that is, the X mean, and then let all x minus the mean, this step is consistent with PCA.
2, bleaching: The purpose is to multiply X by a matrix, so that the covariance matrix is. Explain it, the original vector is x. After the conversion is.
The covariance matrix is that
We just need to use the following transformation to get what we want from X.
which uses eigenvalue decomposition to get E (eigenvector matrix) and D (Eigenvalue diagonal matrix), the formula is
Here is a visual description of the following diagram:
Assuming that the signal source S1 and S2 are independent, such as the horizontal axis is S1, longitudinal axes are s2, according to S1 not S2.
We only know what they synthesized after the signal X, as follows
At this time X1 and X2 is not independent (such as looking at the top of the sharp angle, know the X1 know x2). Then the maximum likelihood probability estimate of the direct generation before us is problematic, because we assume that X is independent.
So, bleach this step in order to make X Independent. The bleaching results are as follows:
You can see that the data has become a phalanx, and the dimensions have reached independence.
However, when the X-distribution is very good, this can be converted, what if there is noise? The previously mentioned PCA method can be used to reduce the dimensionality of the data, filter out the noise signal, get the K-dimensional orthogonal vector, and then use ICA.
10. Summary
ICA is a powerful method in the field of blind signal analysis, and it is also a method to find the implicit factor of non-Gaussian distribution data. From our familiar sample-feature point of view, we use ICA as a precondition that we think that the sample data is generated by the independent non-Gaussian distribution of the hidden factors, the number of hidden factors equals the number of features, we require an implicit factor.
The PCA considers that the feature is generated by K-orthogonal features (which can also be considered as hidden factors), and we require that the data be projected on the new feature. The same factor analysis, a more suitable for reducing the signal (because the signal is more regular, often not Gaussian distribution), a more suitable for dimensionality reduction (with so many features, K orthogonal). Sometimes it is necessary to combine the two together. This paragraph is my personal understanding, for reference only.
"Reprint" ICA extension description