7. ICAAlgorithmExtended description
The content above is basically on the handout, and I have read another article, "Independent Component Analysis:
Algorithms and applications (aapo hyvärinen and Erkki oja) are somewhat different. The following is a summary of this article.ArticleSome of the content mentioned in ).
First, it mentions an uncorrelated concept similar to "independence )". Uncorrelated is partially independent, rather than completely independent. How can we portray it?
If the random variable and the random variable are independent, if and only if.
If the random variable is irrelevant
The second unrelated condition is "loose" than the first independent condition. This is because independence is irrelevant and independence cannot be promoted.
The proof is as follows:
In turn, it cannot be launched.
For example, the Union distribution of and is as follows (), (0,-1 ).
Therefore, it is irrelevant,
Therefore, and does not meet the above integral formula, and is not independent.
As mentioned above, if A is Gaussian distribution and A is orthogonal, it is also Gaussian distribution and is independent from each other. Therefore, we cannot determine a because any orthogonal transformation can achieve the same distribution. However, if only one component is Gaussian distributed, you can still use ICA.
So the problem that ICA should solve becomes: how to introduce s from X so that S is least able to satisfy Gaussian distribution?
The central limit theorem tells us that the sum of a large number of random variables in the same independent distribution satisfies the Gaussian distribution.
We have always assumed that the principal component is generated by the same independent distribution through the hybrid matrix. Therefore, we need to calculate each component for the sake of result. So, the reason why Z is so difficult to define is to describe a link. We want to get y by concatenating one to form a linear combination. However, we do not know whether Y is the real component of S, but we know that Y is the linear combination of the real component of S. Since we cannot make the component of s a Gaussian distribution, our goal is to make y (that is) the least possible W of Gaussian distribution.
The problem is how to recursively determine whether Y is a Gaussian distribution.
One measurement method is the kurtosis method. The formula is as follows:
If y is a Gaussian distribution, the function value is 0. Otherwise, the value is not 0 in most cases.
However, this measurement method is not very good and has many problems. See the following method:
Negative Entropy (Negentropy) measurement method.
In information theory, we know that for discrete random variable Y, its entropy is
Continuous value is
In information theory, there is a strong conclusion that the random variables of Gaussian distribution have the largest entropy in the same square difference distribution. That is to say, for a random variable, when the Gaussian distribution is met, it is the most random.
The formula for defining negative entropy is as follows:
That is, the Entropy Difference Between the random variable Y and the Gaussian distribution. The problem with this formula is that it is more complex to calculate directly, and an approximation strategy is generally used.
This approach is not good enough. The author puts forward a better formula based on the maximum entropy:
Then FastICA is based on this formula.
Another measurement method is the least mutual information method:
This formula can be used to explain that the first H is the encoding length (understood in Information encoding), and the second H is the average encoding length when y becomes a random variable. The subsequent content, including FastICA, will not be introduced, and I have not understood it.
8. Projection Pursuit)
In statistics, projection tracing is used to find the "interesting" projection of multi-dimensional data. These projections can be used in data visualization, density estimation, and regression. For example, in one-dimensional projection tracing, we look for a straight line so that all data points can be projected onto a straight line to reflect the data distribution. However, what we do not want most is Gaussian distribution, which is the least like the most interesting of Gaussian distribution data points. This is consistent with our ICA philosophy. Finding an Independent S is the least possible Gaussian distribution.
In, the principal component is an vertical axis with the largest variance, but the most interesting is the horizontal axis, because it can separate two classes (Signal Separation ).
9. ICA algorithm pre-processing steps
1,Centralization:That is to say, calculate the x mean, and then let all X minus the mean. This step is consistent with PCA.
2,Bleaching:The purpose is to multiply X by a matrix so that the covariance matrix is. Let's explain. The original vector is X. After conversion.
The covariance matrix of is, that is
We only need to use the following Transformation to get the desired result from X.
Feature value decomposition is used to obtain e (feature vector matrix) and D (feature diagonal matrix). The formula is
The following figure is a brief description:
Assume that the signal source S1 and S2 are independent. For example, if the horizontal axis is S1 and the vertical axis is S2, S2.
We only know the signal X after their synthesis, as shown below:
At this time, X1 and X2 are not independent (for example, if you look at the top corner, you will know about x2 ). We assume that X is independent.
Therefore, bleaching is performed to make X independent. The bleaching result is as follows:
We can see that the data has changed to a square matrix, and the dimension has reached an independent level.
However, when the X distribution is good, it can be converted like this. What should I do when there is noise? We can first use the PCA method mentioned above to reduce the data dimension, filter out the noise signal, obtain the K-dimensional orthogonal vector, and then use ICA.
10. Summary
ICA is a powerful method in the field of blind signal analysis. It is also a method for finding hidden factors in Non-Gaussian distribution data. From the perspective of samples-features we are familiar with, the prerequisite for using ICA is that the sample data is generated by the implicit factor of independent non-Gaussian distribution, and the number of hidden factors is equal to the number of features, what we need is an implicit factor.
PCA considers that features are generated by K orthogonal features (which can also be considered as implicit factors). What we need is the projection of data on new features. It is also a factor analysis. One is more suitable for signal reduction (because the signal is relatively regular and often not Gaussian distribution), and the other is more suitable for dimensionality reduction (What do we do with so many features, k orthogonal values ). Sometimes you need to combine the two. This section is my personal understanding and is for reference only.