Factor analysis--the realization of principal component algorithm

Last Update:2015-06-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Principal factor analysis, mentioned in the refining into gold course:

?
A method of dimensionality reduction is the generalization and development of principal component analysis.
?
is a statistical model used to analyze the effects of factors behind surface phenomena. An attempt to use the least number of non-measurable
The sum of the linear function and the special factor of the common factor to describe each component of the original observation
?
Example: Academic achievement (mathematical ability, language ability, transport ability, etc.)
?
Example: Life satisfaction (job satisfaction, family satisfaction)
?
Example: Shiry book P522

The summary is: the multiple variables, according to the subjective (business experience), or objective (specific classification algorithm), classified into several categories, so that the variable is reduced , easy to analyze. Like what

Here, the value of M represents the last only 2 factors, that is, the last eight indicators will become 2 indicators, we can probably predict that these two indicators (factors) are generally sprint ability and long-distance running ability , of course, you can define the last

The number of factors is 3, interpreted as sprint ability, long-distance and long-distance running ability.

Let's look at a matrix like this:

The matrix has 30 students (samples), each sample has 4 factors, then the problem, such as we want to classify the 4 factors, the assumption is divided into 2 classes as a new sample variable.

Abstract as a mathematical formula:

Obviously, there are p=4,μ1 that are y1~y30 mean, Var (X) is the covariance between x1~x4 here: Y is the number of samples (serial number, number of students), and X represents the content of each sample ( test item ).

In fact, we can find a problem, we are looking for the relationship between the factors, so as long as the relationship between X and the number of samples Y (ordinal) does not have a relationship, you can think: weight, height, bust and sitting height between the existence of a relationship, the height of the classmate weight is generally large, But these relationships are not related to the number of students observed , so we are still using the relationship between x, the covariance!

So, every sample of observation can be described as:

Here, we focus on the sample x, if we m take 2, then there is only f1,f2 two public new factor, ε1,ε2,,εp is a special factor, each common factor fi corresponds to at least two original variable x, otherwise it is counted into a special factor.

The above matrix can also be abbreviated as:

OK, so we have a direct observation of the X, Var (F) is the I-unit matrix, so our goal is to find a!!!

First look at three properties:

Description: 1,σ is the covariance matrix of X, property 1 is pushed out

The 2,x matrix is multiplied by a factor C at the same time, without affecting the result.

Two indicators are also introduced:

These two indicators, I think there is no big use (probably just my book read less).

Needless to say, we now have to calculate a matrix according to the nature of 1!

Before I calculate a matrix, I need to introduce a covariance decomposition theorem first.

That is, s (covariance), can be split into multiple eigenvalues * eigenvectors * eigenvectors of the and, here we can find the main component of the shadow, do not have to take p, we just take the first few

Contribute a larger I value.

At this point, a can be obtained from the above description, but there is still a shortage

Properties: When the matrix rotates, the covariance does not change.

Next, you'll use the R language to analyze one case: and experiment with the first running data.

We don't care about the exact sample, so we get the covariance matrix here.

Factor analysis of structural principal component algorithm
Factor.analy1<-function (S, m) {  p<-nrow (s);//Fetch rows
  Diag_s<-diag (S); Covariance finding (the sum of diagonal elements)
  Sum_rank<-sum (diag_s)
Take the row name  rowname<-paste ("X", 1:p, sep= "")  colname<-paste ("Factor", 1:m, sep= "")    A<-matrix (0, Nrow=p, Ncol=m,             dimnames=list (RowName, colname))    Eig<-eigen (S)  //According to the formula A for  (i in 1:m)    a[,i ]<-SQRT (Eig$values[i]) *eig$vectors[,i]  //Ask H  h<-diag (a%*%t (A))    rowname<-c ("SS loadings", "Proportion var", "cumulative var")  B<-matrix (0, nrow=3, ncol=m, Dimnames=list             (RowName, colname))
  For (I-in 1:m) {    b[1,i]<-sum (a[,i]^2)    b[2,i]<-b[1,i]/sum_rank    b[3,i]<-sum (b[1,1:i])/sum_ Rank  }  method<-c ("Principal Component method")  list (Method=method, Loadings=a,        var=cbind ( Common=h, spcific=diag_s-h), b=b)//diag_s-h seek σ

According to the specific data:

X<-c (1.000,      0.923, 1.000,     0.841, 0.851, 1.000,       0.756, 0.807, 0.870, 1.000,      0.700, 0.775, 0.835, 0.918, 1.000,      0.619, 0.695, 0.779, 0.864, 0.928, 1.000, 0.633, 0.697, 0.787, 0.869,      0.935, 0.975, 1.000,      0.520 , 0.596, 0.705, 0.806, 0.866, 0.932, 0.943, 1.000) names<-c ("X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8") R<-matrix ( 0, Nrow=8, ncol=8, dimnames=list (names, names))//complement X-matrix for (I-in-1:8) {for  (J-in-1:i) {    r[i,j]<-x[(i-1) *i/2+j]; R[J,I]<-R[I,J]  }}source ("E:/MYPATH/FACTOR.ANALY1.R")//to write the absolute path of the calling function Fa<-factor.analy1 (R, m=2); fae<-r-fa$loadings%*% T (fa$loadings)-diag (fa$var[,2]) sum (e^2)

Results:

Loadings, Factor1 and Factor2 are two new main factors, common=h, spcific=diag_s-h,b insignificant, when M selection is more reasonable, the actual value of S and estimated value gap is very small, that is, e^2 is small.

The final step, the rotation factor matrix, makes the factor practical meaning more obvious.

Vm1<-varimax (fa$loadings, normalize = F); Vm1/*varimax (x, normalize = TRUE, EPS = 1e-5) x load and matrix, normalize is a logical variable that indicates whether Kaiser regularization is performed, EPS is iterative precision */

Results after rotation:

It can be observed that the F1 factor is biased toward x5,x6,x7,x8 (absolute value), so it can be interpreted as an endurance factor (good long-distance running), whereas F2 is the opposite.

In summary, the main component analysis of the factor analysis is complete, it is necessary to mention that the screening factor is also the main factor method and the maximum likelihood method, but the specific algorithm is more complex, as to whether I will continue to study is still to be determined, but the three methods to draw the conclusion is similar.

PS: The data and pictures of this blog are from Shiry Teacher's book "Statistical Modeling and R Software", here to express our thanks, if there is infringement, immediately screen (should not, everyone just for learning).

Factor analysis--the realization of principal component algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Factor analysis--the realization of principal component algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Factor analysis--the realization of principal component algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support