Correlation coefficient is a measure of the correlation between variables, is also a lot of analysis of the link, SPSS to do correlation analysis is relatively simple, mainly the difference how to use these correlation coefficient, if do not want to quantitatively analyze the correlation, direct observation scatter chart can also.
The correlation coefficients have some areas to note:
1, there is a correlation between the two variables, only means that there is an association, does not imply causality.
2, the correlation coefficient can not carry on the subtraction operation, no unit, different correlation coefficient cannot compare
3, the correlation coefficient size is easy to be affected by the data value interval size and data number size.
4, the correlation coefficient also needs to carry on the examination to determine whether it has the statistical significance
The hypothesis test of correlation coefficient
H0: Correlation coefficient = 0, no correlation between variables
H1: Correlation coefficient ≠0, correlation between variables
The correlation coefficient is many, we generally choose according to the variable type, we know that the variable type from low to advanced can be divided into four types of fixed, fixed, fixed, fixed, and variable data type can be divided into continuous or discrete type, attention do not confuse
One, fixed distance, constant ratio variable, basically is continuous variable
The general use of Pearson correlation coefficient, also known as the product error correlation coefficient, is a linear correlation coefficient, the most widely used, the application condition is that the two variables need to be linear relationship, and all from the normal distribution, and requires a pair of appearance
Second, fixed-order, fixed-distance, fixed-ratio variables
Generally used Spearman grade correlation coefficient is also called rank correlation coefficient, which takes advantage of the order information of variables, and does not have too much demand for raw data, so it uses a wider range than the Pearson correlation coefficient, it uses the rank size of two variables as the analysis basis, It can also be considered that the Pearson correlation coefficient is based on rank, and when the data does not meet the requirements of Pearson correlation coefficient, the spearman correlation coefficient can be chosen, but if it is a fixed-distance or constant-ratio variable, or the Pearson correlation coefficient is recommended, The efficiency of spearman correlation coefficient is slightly lower.
Three, only limited order variables
1.Gamma correlation coefficient
2.Kendall grade correlation coefficient, divided into τ-a,τ-b,τ-c three kinds
3.Somer ' s D correlation coefficient
Four, definite class variable
The correlation of definite class variables is mostly derived from the chi-square value.
1. Person Card Square
In fact, the chi-square test
2. List of contacts
3.φ-phi coefficient
4.Cramer ' s v factor
5.Lambda (λ) factor
Tau-y coefficients for 6.Goodman and Kruskal
Five or two categorical variables
1. Relative Risk RR Value
2. Advantage ratio or value
=========================================================
After familiar with the various correlation coefficients, let's look at the operation in SPSS
1. Analysis-Descriptive statistics-cross-table
This process is typically used to analyze the list of tables, since the data is composed mostly of a list of tables, so the process contains a number of correlation coefficients
2. Analysis-correlation-double variables
This analysis is a simple correlation and is the most commonly used correlation.
3. Analysis-correlation-Partial correlation
Variables are interrelated, we analyze the correlation between the two variables, we inevitably carry the influence of other variables, in order to get the pure correlation between the two variables, we need to control the influence of some variables, at this time the correlation analysis is called the partial correlation analysis.
In fact, the partial correlation coefficient is based on the variable that wants to analyze as the dependent variable, the controlled variable is the independent variable to fit two regression equations respectively, and the two groups of residual errors are analyzed in simple correlation.
4. Analysis-Correlation-distance
This process is typically used for exploratory analysis, and sometimes we can initially guess the relevance of the variables, such as the number of colleges and the number of patent applications in the above example, but sometimes we don't know what the variables mean, and there's no way to guess. At this point, we can analyze the variance or similarity of variables according to the distance process, and have a preliminary understanding of the data, and then make further analysis according to the results.
Since the distance is only descriptive analysis and does not involve hypothesis testing, the results do not give the same p-value and correlation coefficients, and the measurement distance is also a lot of indicators, and according to the types of variables are also differentiated.
SPSS Data Analysis-correlation