Factor analysis (Factor analyst) "PDF version" factor analysis1 questions
In the training data we considered before, the number of M in the sample is much larger than the number of its features n, so that no matter the regression, clustering and so on are not too big problem. However, when the number of training samples M is too small, even m<<n, using gradient descent method for regression, if the initial value is different, the resulting parameter results will be very large deviation (because the number of equations is less than the number of parameters). In addition, if you use a multivariate Gaussian distribution (multivariate Gaussian distribution) to fit the data, there is also a problem. Let's make a calculation and see what the problem is:
The parameter estimation formula for multivariate Gaussian distribution is as follows:
Is the formula for mean and covariance, representing the sample, a total of M, each sample n characteristics, so is an n-dimensional vector, is the n*n covariance matrix.
When the m<<n, we will find that the singular array (), that is, there is no way to fit the multivariate Gaussian distribution, the exact estimate is that we do not come out.
What if we still want to use the multivariate Gaussian distribution to estimate the sample?
2 limiting covariance matrices
When there is not enough data to estimate, then only a certain assumption of the model parameters, before we want to estimate the complete (all the elements in the matrix), now we assume that the diagonal array (each feature is independent of each other), then we only need to calculate the variance of each feature, the last only the diagonal element is not 0
Recalling the geometrical properties of the two-dimensional Gaussian distribution we discussed earlier, the projection on the plane is an ellipse, and the center point is determined by the shape of the ellipse. If it turns into a diagonal array, it means that the two axes of the ellipse are parallel to the axis.
If we want to limit further, we can assume that the elements on the diagonal are equivalent.
which
That is, the mean value of the element on the diagonal of the previous step, which is reflected in the two Gaussian distribution map is the ellipse into a circle.
When we want to estimate the completeness, we need to m>=n+1 to ensure that the maximum likelihood estimates are derived from the non-singular. However, under any of the above assumptions, only m>=2 can be estimated as qualified.
The disadvantage of this is also obvious, we think that the characteristics of independent, this hypothesis is too strong. Next, we give a method called factor analysis, use more parameters to analyze the relationship between features, and do not need to calculate a complete.
3 Edge and conditional Gaussian distribution
Before discussing the factor analysis, we first look at the method of conditional and marginal Gaussian distribution in the multivariate Gaussian distribution. This is useful in the EM derivation of the factor analysis later.
Suppose X is made up of two random vectors (which can be thought of as dividing the previous into two parts)
of which, then. Suppose X obeys a multivariate Gaussian distribution, where
Where, then, because the covariance matrix is symmetric, so.
The overall view and the joint distribution conform to the multivariate Gaussian distribution.
So what is the distribution of the edges that is only known in the case of joint distribution? From the above and can be seen,
, we verify the second result below
Thus, the marginal distribution of multivariate Gaussian distribution is still a multi-Gaussian distribution. Other words.
Interestingly in the above cov (x), this is different from the previous calculation of covariance. The previous covariance matrix is for a random variable (multidimensional vector), and the evaluation is the relationship between the two random vectors. such as ={height, weight},={sex, income}, then ask for height and height, height and weight, weight and weight of the covariance. The difference between height and gender, height and income, weight and gender, weight and income covariance seems to be quite different from the previous one.
The above is the edge distribution, let us consider the problem of conditional distribution, that is, the problem. According to the definition of multivariate Gaussian distribution,.
And
This is the formula we need to calculate next, these two formulas are given directly, there is no derivation process. If you want to know the specific derivation process, see Chuong B. Do write "Gaussian processes".
4 Factor Analysis Examples
The following is a simple example that leads to the idea behind factor analysis.
The essence of factor analysis is that the training sample of M-N-dimensional features is produced as follows:
1, first in a k-dimensional space in accordance with the multivariate Gaussian distribution generated m (k-dimensional vector), that is,
2, then there is a transformation matrix, will be mapped into n-dimensional space, that is,
Since the mean value is 0, the mapping is still 0.
3. Then a mean value (n dimension) is added, i.e.
The corresponding meaning is to move the transformed (n-dimensional vector) to the center point of the sample.
4, because the real sample and the above model generated error, so we continue to add error (n-dimensional vector),
And it conforms to the multivariate Gaussian distribution, i.e.
5, the final result is considered to be a real training sample generation formula
Let's use an intuitive method to explain the above process:
Suppose we have m=5 a 2-dimensional sample point (two features), as follows:
Then according to the understanding of factor analysis, the sample point generation process is as follows:
1, we first think in 1-dimensional space (here k=1), there is a normal distribution generated by the M points, as follows
The mean value is 0, and the variance is 1.
2, then use a one-dimensional z mapping to 2 dimensions, the graph is as follows:
3, after adding, will be all points of the horizontal movement, ordinate movement, the line moved to a position, so that the line over point, the original left axis is now the origin (red dot).
However, the sample points cannot be so regular, there will be some deviations in the model, so we need to do some perturbation (error) and perturbation of the points generated in the previous step.
4. After adding the disturbance, we get the Black sample as follows:
5, where the mean value of Z and is 0, so is the original sample point (Black point) mean.
From the above visual analysis, we know that the factor analysis is actually that the high-dimensional sample point is actually from the low-dimensional sample point through the Gaussian distribution, linear transformation, error disturbance generated, so the high-dimensional data can be expressed using a low bellavita.
5 Factor analysis Model
The above procedure is to obtain the observed sample points from the implicit random variable Z through transformation and error perturbation. Where z is called a factor and is low-dimensional.
Let's go through the equation again as follows:
Where the error and Z are independent.
The following method of factor analysis is a matrix representation, and some other representations are given in the resources, and if you do not understand the matrix notation, you can refer to other data.
The matrix notation considers that Z and X are combined in accordance with the multivariate Gaussian distribution, as follows
E[x before you ask for it]
We know e[z]=0, so
The next step is calculation,
which
Then ask
This process takes advantage of the z and independent assumptions (). And will be considered as known variables.
Then ask
Then the final form of the joint distribution is obtained.
The edge distribution of x can be seen from the above formula
Then the maximum likelihood estimate of the sample
Then the partial derivative of each parameter does not get the value of each parameter?
It's a pity we can't get closed-form. Think about it, if you can get it, why do you put z and x together for a joint distribution? Based on the previous understanding of parameter estimation, we can consider using EM to estimate when there is an implied variable Z.
EM estimation of 6 factor analysis
Let's start by defining each parameter, Z is an implied variable, and is the parameter to be evaluated.
Think of em two steps:
Loop repeats until convergence { (e step) for each I, calculate (M-Step) calculation |
Let's apply:
(e-Step):
According to the conditions of the 3rd section of the distribution discussion,
So
Then, according to the multivariate Gaussian distribution formula, we get
(M Step):
Direct write to maximize the goal is to
Where the parameters to be evaluated are
Here are the estimated formulas we'll focus on.
First, the above is simplified to:
This shows the distribution of obedience. And then get rid of the irrelevant items (the latter two),
After removing irrelevant first two items, the pair is guided,
The first and second steps take advantage of TR a = A (A is a real number) and tr AB = tr BA. The final step takes advantage
TR is to seek a matrix on the diagonal elements and.
Finally let the value be 0, and simplify the
and get
Here we find that this formula is somewhat familiar, similar to the least squares matrix in previous regression
Here we explain the similarity of the two, where x is the linear function of Z (which contains a certain amount of noise). After we get the estimate of Z in step e, we are looking for a linear relationship between X and Z. The least square method is also to find the direct linear relation between the characteristic and the result.
It's not over yet, we need to find the value inside the parentheses.
According to our previous definition of z|x, we know
The first step is based on the conditional distribution of z, and the second step is based on getting
Substituting the above result into (7)
At this point, we've got to note that the difference is e[z] and that the latter requires a Z covariance.
The iteration formula for the other parameters is as follows:
The mean value does not change during the iteration.
Then the elements on the diagonal are extracted and placed in the corresponding, it is obtained.
7 Summary
Based on the EM procedure above, to factor analysis of sample x, you only need to know the number of factors to decompose (the dimension of Z). With EM, we can get the transformation matrix and the error covariance.
Factor analysis is actually dimensionality reduction, and after each parameter is obtained, z can be obtained. But the meaning of each parameter of z needs to be figured out by itself.
The following excerpt from a PPT to further explain the factor analysis.
Factor Analysis (factor) is a technique for simplifying data. It explores the basic structure of observational data by studying the internal dependencies among many variables, and expresses its basic data structure with a few imaginary variables. These hypothetical variables can reflect the main information of the original many variables. The original variable is observable in the variable, while the imaginary variable is a latent variable that is not observable, called a factor.
For example, in the study of corporate image or brand image, consumers can evaluate the merits and demerits of 24 aspects of department stores through an evaluation system consisting of 24 indicators.
But the main concern of the consumer is three aspects, namely the store environment, the service of the store and the price of the goods. The factor analysis method can evaluate the store by using 24 variables to find out three potential factors that reflect the store environment, store service level and commodity price. And these three common factors can be expressed as:
Here is the first component of the sample x, that is, the I component, that is, the first row of the J column element, is the first component of Z, is.
is a potential non-observable factor. 24 variables share these three factors, but each variable has its own personality, not the part that is included, called a special factor.
Note:
Factor analysis is different from regression analysis, the factor in factor analysis is a relatively abstract concept, and the regression factor has very definite practical meaning.
principal component Analysis and factor analysis are also different, principal component analysis is only variable transformation, and factor analysis needs to construct factor model.
principal component analysis: The linear combination of primitive variables represents a new synthetic variable, i.e. principal component;
Factor Analysis: The linear combination of potential imaginary variables and randomly affected variables represents the original variables.
PPT Address
Http://www.math.zju.edu.cn/webpagenew/uploadfiles/attachfiles/2008123195228555.ppt
Other documents worthy of reference
An Introduction to probabilistic graphical Models by Jordan Chapter 14
the difference between principal component analysis and factor analysis http://cos.name/old/view.php?tid=10&id=82
"Reprint" Factor Analysis (Factor)