1.PCA
Usage Scenario: Principal component analysis is a data dimensionality reduction that converts a large number of related variables into a small set of unrelated variables called principal components.
Steps:
- Data preprocessing (guaranteed no missing values in the data)
- Select a factor model (whether PCA or EFA)
- Determine the number of principal components/factors to select
- Select Principal Component
- Rotational principal components
- Interpreting the results
- Calculate the score of a principal component or factor
Case study: 11 variables from the usjudgeratings data set, how to reduce data (single principal component analysis)
1. Use the gravel map to determine the number of principal components to be extracted
1 Library (Psych) 2 # 1. Make a gravel plot to determine the number of principal components 3 fa.parallel (usjudgeratings[,-1],fa='pc'scree plot with ParallelAnalysis')
Conclusion: around points with eigenvalues greater than 1, it is shown that preserving 1 principal components is a
2. Extracting principal components
1 # 1. The first parameter is a relational matrix 2 # 2.nfactors Specify the number of principal components 3 # 3.rotate Specifies the rotation method, default VariMAX 4 # 4.scores Indicates whether the main component score needs to be calculated, the default does not need 5 PC <-Principal (usjudgeratings[,-1],nfactors = 1)6 pc
Conclusion: The first principal component is highly correlated with each variable
3. Get the principal component score
1 PC <-Principal (usjudgeratings[,-1],nfactors = 1,scores = T)2Head (pc$scores) 3 Cor (usjudgeratings$cont,pc$scores)
4. Get Correlation coefficients
1 cor (usjudgeratings$cont,pc$scores)
Conclusion: The relationship between lawyers and judges has nothing to do with lawyers ' ratings.
Case 2: Principal component Analysis (multiple principal component analysis) to reduce the body index of a girl
1. Number of Judgments
1 fa.parallel (harman23.cor$cov,n.obs = 203,fa='pc', n.iter = 100,show.legend = F, 2 ' scree plot with parallel analysis ')
Conclusion: There are 2 points on the horizontal line 1, so 2 principal components are required
2. Principal component Analysis
' None ' ) PC2
Conclusion: Further analysis of data rotation is required
3. Main component rotation (as far as possible to remove the component noise)
1 ' VariMAX ' )2 RC
4. Get the score factor of the principal component
1 round (Unclass (rc$weights), 2)
Conclusion: The principal component score can be calculated by the coefficient * value.
2.EFA
Usage scenarios: Explore factor analysis discovery data the next group of less-observable variables to explain the correlation of a set of observable variables
Case: Using EFA for 6 psychological tests to test participants ' scores
1. Determine the number of factors to be extracted
1 covariances <- ability.cov$cov2 correlations <- Cov2cor (covariances) 3 fa.parallel (correlations,n.obs = 112,fa='both"scree plots with ParallelAnalysis')
Conclusion: 2 factors need to be extracted because the graph has 2 distributions above the inflection point.
2. Extracting Common factors
1 ' None ', fm='pa')2 FA
Conclusions: 2 factors explain the 60% variance of 6 psychological tests and need to be further rotated
1 fa.varimax <-fa (correlations,nfactors=2,rotate='varimax', fm= 'pa')2 Fa.varimax
Conclusion: Reading and vocabulary in the first factor accounted for a larger, drawing, building blocks on the second factor is larger, if you want to further determine whether this factor is related, you need to use the oblique rotation extraction factor
1 fa.promax <-fa (correlations,nfactors=2,rotate='promax', fm=' PA ' )2 Fa.promax
Conclusion: The correlation is 0.57, the correlation is very big, if the correlation is not small, using the orthogonal rotation can
3. Calculate the Score
1FSM <-function (oblique) {2 if(class(oblique) [2]=="FA"& is. Null (Oblique$phi)) {3Warning"Object doesn ' t look like oblique EFA")4}Else {5P <-Unclass (oblique$loading)6F <-P%*%Oblique$phi7Colnames (F) <-C ("PA1","PA2")8 return(F)9 }Ten } One AFSM (FA.PROMAX)
5. Orthogonal rotation derived factor score graph
Conclusion: Vocabulary and reading load on the first factor, picture, maze, building blocks on the second factor load larger, common intelligence detection in the distribution of the average
6. Factor graph generated by skew rotation
1 fa.diagram (fa.promax,simple = F)
Conclusion: The relationship between the factors is shown, and the graph is more accurate than the previous one.
R language-Principal component analysis