Factor Analysis)

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

[PDF version] Factor Analysis1 problem

In the training data we have considered, the number of samples m is much greater than the number of features N, so no matter the regression or clustering, there is not much problem. However, when the number of training samples m is too small, or even m <n, when the gradient descent method is used for regression, if the initial values are different, the result of the obtained parameter has a large deviation (because the number of equations is smaller than the number of parameters ). In addition, if multi-Gaussian distribution (multivariate Gaussian distribution) is used for data fitting, there will also be problems. Let's calculate and see if there are any problems:

The Parameter Estimation Formula for multivariate Gaussian distribution is as follows:

They are the formula for finding mean and covariance, indicating the sample. There are m features in total, and N features in each sample. Therefore, they are n-dimensional vectors and are the N * n covariance matrix.

When M <n, we will find that it is a singular array (), that is, there is no way to fit the multivariate Gaussian distribution. To be exact, we cannot estimate it.

What if we still want to use multivariate Gaussian distribution to estimate samples?

2. Restricted Covariance Matrix

When there is not enough data to estimate, we can only make certain assumptions about the model parameters. Before that, we wanted to estimate completely (all elements in the matrix ), now we assume that it is a diagonal array (each feature is independent of each other), then we only need to calculate the variance of each feature, and finally the element on the diagonal is not 0.

Looking back to the geometric characteristics of the two-dimensional multivariate Gaussian distribution we have discussed before, the projection on the plane is an elliptic, the center is determined, and the shape of the elliptic is determined. If it turns into a diagonal array, it means that the two axes of the elliptic are parallel to the coordinate axis.

For further restrictions, we can assume that the elements on the diagonal line are equivalent.

Where

That is, the mean value of the element on the diagonal line of the previous step, which is reflected on the two-dimensional Gaussian distribution chart, that is, the elliptic turns into a circle.

When we want to estimate the completeness, we need m> = n + 1 to ensure that what we get under the maximum likelihood estimation is not singular. However, in any of the above assumptions, the limitation can be estimated as long as m> = 2.

The disadvantage of doing so is obviously easy to see. We think that features are independent, and this assumption is too strong. Next, we provide a method called factor analysis, which uses more parameters to analyze the relationship between features and does not need to calculate a complete one.

3 edge and conditional Gaussian distribution

Before discussing factor analysis, Let's first look at the condition and edge Gaussian distribution in the multivariate Gaussian distribution. This is useful in the EM derivation of factor analysis.

Assume that X is composed of two random vectors (we can think of it as dividing the previous one into two parts)

Where, then. Assume that X follows the multivariate Gaussian distribution, where

Where, then, because the covariance matrix is a symmetric array.

In general, it seems that the joint distribution is consistent with the multivariate Gaussian distribution.

So how can we obtain the edge distribution when we only know the Union distribution? From the preceding and we can see that,

, Next we will verify the second result

It can be seen that the edge distribution of the multivariate Gaussian distribution is still the multivariate Gaussian distribution. That is to say.

What is interesting in Cov (x) above is that this is different from the effect of the previous covariance calculation. The covariance matrix is used for a random variable (multi-dimensional vector), and the relationship between two random vectors is evaluated. For example, if you are = {height, weight },= {gender, income}, you need height and height, height and weight, and the covariance between weight and weight. What we want is height and gender, height and income, weight and gender, and the covariance between weight and income. It seems very different from the previous one.

The above is the edge distribution. Let's consider the issue of conditional distribution. According to the definition of Multivariate Gaussian distribution ,.

And

This is the formula we need for the next calculation. These two formulas are provided directly without the derivation process. For details about the derivation process, see Gaussian Processes written by chuong B. Do.

4-factor analysis example

The following is a simple example to illustrate the idea behind factor analysis.

The essence of factor analysis is that the training sample of M n-dimensional features is generated as follows:

1. First, M (k-dimensional vectors) are generated based on the multivariate Gaussian distribution in a K-dimensional space, that is

2. There is a transformation matrix that maps to the n-dimensional space, that is

Because the mean value is 0, the ing is still 0.

3. An average value (n-dimensional) is added, that is

The corresponding meaning is to move the transformed (n-dimensional vector) to the center of the sample.

4. Because there is an error between the real sample and the above model, we continue to add the error (n-dimensional vector ),

It also conforms to the multivariate Gaussian distribution, that is

5. The final result is the formula for generating a real training sample.

Let's use an intuitive method to explain the above process:

Suppose we have M = 5 2-dimensional sample points (two features), as follows:

According to the factor analysis, the sample points are generated as follows:

1. We first consider that in a 1-dimensional space (k = 1), there are m points generated by normal distribution, as shown below:

The mean value is 0, and the variance is 1.

2. Map a one-dimensional Z to two-dimensional, and the graphic representation is as follows:

3. Add the following to move the abscissa of all points, and the ordinate to move the straight line to a position, so that the straight line goes over the point, and the origin of the original left axis is now (red point ).

However, the sample points cannot be such a rule and there is a certain degree of deviation in the model. Therefore, we need to make some disturbance (error) and disturbance to the points generated in the previous step.

4. After the disturbance is added, we obtain the following black sample:

5. Because the mean values of Z and Z are both 0, they are also the average values of the original sample points (black points.

From the above intuitive analysis, we know that factor analysis actually means that high-dimensional sample points are actually generated by low-dimensional sample points through Gaussian distribution, linear transformation, and error disturbance, therefore, high-dimensional data can be expressed using low dimensions.

5-Factor Analysis Model

The above process is to obtain the observed sample points from the hidden random variable z through transformation and error disturbance. Z is called a factor and is low-dimensional.

We will repeat the statement as follows:

The error and Z are independent.

The following table describes the expression of factor analysis in matrix notation. Some other representation methods are provided in references. If you do not understand matrix notation, you can refer to other materials.

In matrix representation, the combination of Z and X conforms to the multivariate Gaussian distribution, as shown below:

E [x]

We know that E [Z] = 0, so

The next step is computing,

Where

Next, find

In this process, we use Z and the independent hypothesis (). It is considered as a known variable.

Next, find

Then the final form of the joint distribution is obtained.

From the above formula, we can see the edge distribution of X.

So we can estimate the maximum likelihood of samples.

Then, will the partial derivative of each parameter be obtained?

Unfortunately, we cannot get closed-form. Think about it too. If we can get it, why should we put Z and X together for the joint distribution. According to our previous understanding of parameter estimation, we can consider using EM for estimation when there is an implicit variable Z.

Em estimation of 6-Factor Analysis

Let's first clarify each parameter. Z is an implicit variable and a parameter to be evaluated.

Recall the two steps of EM:

Loop repeats until convergence {

(Step E) calculate

(M step) computing

Let's apply the following:

(Step E ):

Based on the conditional distribution in section 3rd,

Therefore

According to the multivariate Gaussian distribution formula

(Step m ):

The goal of direct writing is to maximize

The parameter to be evaluated is

The following Estimation Formula

First, simplify the above formula:

In this example, the distribution is obeyed. Remove irrelevant items (the last two items ).

After removing the first two irrelevant items, perform the import,

Step 1 and Step 2 Use tr a = a (when a is a real number) and TR AB = tr Ba. The last step utilizes

TR is used to obtain the element and on the diagonal line of a matrix.

Finally, set the value to 0 and simplify it.

Then

Here we find that this formula is a bit familiar, similar to the form of the least square matrix in the previous regression.

Here we will explain the similarity between the two. Here, X is a linear function of Z (including certain noise ). After we get the estimation of Z in Step E, we are looking for a linear relationship between X and Z. The least square method is also a linear relationship between features and results.

This is not complete yet. We need to obtain the values in the brackets.

According to our previous definition of z | X, we know that

The first step is obtained based on the condition distribution of Z, and the second step is based on

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Factor Analysis)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Factor Analysis)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support