[Turn] Independent component analysis (independent Component Analyst)

Last Update:2016-12-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original address: http://www.cnblogs.com/jerrylead/archive/2011/04/19/2021071.html Independent component Analysis (independent Component Analyst)1. Question:

1, the PCA mentioned in the previous section is a data reduction method, but only for the Gaussian distribution of the sample point is more effective, then for other distributions of the sample, there is no method of decomposition of the principal element?

2. The classic cocktail party issue (cocktail problem). Assuming that there are N people in the party, they can talk at the same time, and we also place n sound receivers (microphone) in some corners of the room to record the sound. After the banquet, we get a set of data from n microphones, I represent the time sequence of the sampling, that is to say, a total of M-group samples, each set of samples are n-dimensional. Our goal is to distinguish the signals of each person's speech from the M-set of sampled data alone.

The second problem is refined, there are n signal sources, each dimension is a person's voice signal, each person emits a sound signal independent. A is an unknown mixed matrix (mixing matrix), used to assemble the superimposed signal s, then

The meaning of X is explained above, where x is not a vector, it is a matrix. where each column vector is,

The representation of a graph is

This picture comes from

http://amouraux.webnode.com/research-interests/research-interests-erp-analysis/ blind-source-separation-bss-of-erps-using-independent-component-analysis-ica/

Each component is represented by a linear representation of the component. Both A and s are unknown, X is known, and we have to find a way to launch s from X. This process is also known as a blind signal separation.

Order, then

The W is represented as

Which, in fact, will be written as a line vector form. Then get:

2. ICA uncertainty (ICA ambiguities)

Since both W and s are uncertain, these two related parameters cannot be determined at the same time without a priori knowledge. For example, the above formula S=WX. When W expands twice times, s only needs to expand twice times at a time, the equation still satisfies, and therefore cannot get the unique S. At the same time, if the number of people is scrambled into another order, such as the number of the blue node becomes 3,2,1, then only a column vector order can be swapped, so it is not possible to determine the S. These two conditions are known as the original signal is indeterminate.

There is also the case that ICA is not applicable, that is, the signal can not be Gaussian distribution. Assuming that only two people emit sound signals that conform to a multivalued normal distribution, I is the unit matrix of 2*2, and the probability density function of s is needless to say, with the mean value 0 as the center, the projection surface is the peak shape of the ellipse (see multi-valued Gaussian distribution). Because, therefore, X is also Gaussian distributed, the mean value is 0, and the covariance is.

Make r an orthogonal array. If a is replaced by a '. So. The S distribution does not change, so X ' is still the mean value of 0, covariance.

Therefore, regardless of whether the mixed matrix is a or a ', the distribution of x is the same, then the mixed matrix cannot be determined and the original signal cannot be determined.

3. Density functions and linear transformations

Before discussing ICA-specific algorithms, let's review the knowledge in probability and linear algebra.

Suppose our random variable s has the probability density function (the continuous value is the probability density function, the discrete value is the probability). For simplicity, let's assume that s is a real number, and that there is a random variable x=as,a and x are real numbers. The order is the probability density of x, so how to ask?

First, the equation is transformed into, and then obtained, the solution is complete. Unfortunately, this method is wrong. For example, if s is consistent with uniform distribution (), then the probability density of s is, now make a=2, that is, x=2s, that is, X is evenly distributed on [0,2]. However, the preceding deduction will be obtained. The correct formula should be

Derivation method

More generally, if s is a vector, a reversible phalanx, then the upper formula is still set.

4. ICA algorithm

The ICA algorithm is attributed to Bell and Sejnowski, where maximum likelihood estimation is used to interpret the algorithm, and the original paper uses a complex method Infomax principal.

We assume that each has a probability density, so the joint distribution of the original signal at a given moment is

This formula represents a hypothetical premise: each person emits a sound signal independently. With P (s), we can obtain P (x)

The left side is the probability of each sampled signal x (n-dimensional vector), and the right is the product of each original signal probability | w| times.

As mentioned earlier, if there is no prior knowledge, we cannot obtain W and S. So we need to know that we're going to pick a probability density function to assign to s, but we can't pick the Gaussian distribution density function. In probability theory, we know that the density function p (x) is obtained by derivation of the cumulative distribution function (CDF) F (x). F (x) to meet two properties is: monotonically incrementing and in [0,1]. We find that the sigmoid function is well suited for defining a domain negative infinity to positive infinity, a range of 0 to 1, and a slow increment. We assume that the cumulative distribution function of S is in accordance with the sigmoid function

After derivation

This is the density function of S. Here S is the real number.

If we know the distribution function of S in advance, then we don't have to assume it, but in the case of missing, the sigmoid function can achieve good results on most problems. Since the e[s]=0 is a symmetric function, so the mean value of the e[x]=e[as]=0,x is 0, then the average of this is 0.

Yes, there's only W left. Given the sample training sample, the logarithm likelihood estimate of the sample is as follows:

Using the probability density function of the x obtained earlier, the

Curly braces are inside.

The next is the derivation of W, and here's a question of determinant | W| is the method of derivation, which belongs to matrix calculus. First, the results are given, and the derivation formulas are given at the end of the article.

The resulting derivation formula is as follows, and the derivative is (can be verified by itself):

Among them is the gradient rise rate, which is artificially specified.

Once the iteration has been found, it can be obtained to restore the original signal.

Note: when we calculate the maximum likelihood estimate, it is assumed that it is independent from the other, but this hypothesis cannot be established for speech signals or for other time-dependent features such as temperature. However, when the data is long enough, it is assumed that independence has little effect on the effect, and the convergence speed can be accelerated if the sample is scrambled beforehand and the random gradient rise algorithm is run.

Recalling the cocktail party issue, S is the signal that the person sends, is the continuous value, different time points of the s different, each person sends the signal between independent (and independent). The cumulative probability distribution function of S is the sigmoid function, but everyone emits a sound signal that matches the distribution. A (inverse of W) represents the position change of s relative to X, and X is the result of changes in S and a.

5. Example

The original signal when s=2

X-Signal observed

Using the S-Signal after ICA restore

6. The gradient of the determinant

For the derivation of the determinant, the matrix A is nxn, we know that the determinant is related to the algebraic cofactor type,

is to remove the cofactor type after the J column of line I, then the derivative

Adj (a) with our linear algebra middle school is a meaning, so

[Turn] Independent component analysis (independent Component Analyst)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[Turn] Independent component analysis (independent Component Analyst)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

[Turn] Independent component analysis (independent Component Analyst)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support