A relationship between two variables or two sets of variables, called correlations for a continuous variable, is called associativity for categorical variables.
one, the correlation between continuous variables
Common commands and options are as follows
Here's how to use it:
1. Calculate correlation coefficient and correlation coefficient matrix
> Cor (count,speed)
[1] 0.7237206
> Cor (count,speed,method = "Spearman")
[1] 0.5269556
> Cor (MF)
Length Speed Algae NO3 BOD
Length 1.0000000-0.34322968 0.7650757 0.45476093-0.8055507
Speed-0.3432297 1.00000000-0.1134416 0.02257931 0.1983412
Algae 0.7650757-0.11344163 1.0000000 0.37706463-0.8365705
NO3 0.4547609 0.02257931 0.3770646 1.00000000-0.3751308
BOD-0.8055507 0.19834122-0.8365705-0.37513077 1.0000000
> Cor (MF$LENGTH,MF) can display the correlation coefficients of a variable through the $ designation
Length Speed Algae NO3 BOD
[1,] 1-0.3432297 0.7650757 0.4547609-0.8055507
2. Calculating variance and covariance matrices
> CoV (count,speed)
[1] 123
> var (count,speed)
[1] 123
> CoV (MF)
Length Speed Algae NO3 BOD
Length 9.4900000-4.95000000 45.858333 0.70683333-111.55667
Speed-4.9500000 21.91666667-10.333333 0.05333333 41.74167
Algae 45.8583333-10.33333333 378.583333 3.70166667-731.73333
NO3 0.7068333 0.05333333 3.701667 0.25456667-8.50850
BOD-111.5566667 41.74166667-731.733333-8.50850000 2020.87333
> Cov2cor (CoV (MF))
Length Speed Algae NO3 BOD
Length 1.0000000-0.34322968 0.7650757 0.45476093-0.8055507
Speed-0.3432297 1.00000000-0.1134416 0.02257931 0.1983412
Algae 0.7650757-0.11344163 1.0000000 0.37706463-0.8365705
NO3 0.4547609 0.02257931 0.3770646 1.00000000-0.3751308
BOD-0.8055507 0.19834122-0.8365705-0.37513077 1.0000000
3. The significance of correlation coefficient test
> Cor.test (count,speed)
Pearson ' s product-moment correlation
Data:count and Speed
t = 2.5689, df = 6, P-value = 0.0424
Alternative hypothesis:true correlation is not equal to 0
Percent Confidence interval:
0.03887166 0.94596455
Sample estimates:
Cor
0.7237206
4. Using formula syntax
> cor.test (~count+speed,data=fw3,subset = cover%in%c ("Open", "closed"))
Calculates the correlation coefficient test for the open and closed two categories of cover categorical variables in the FW3 data frame.
============================================================
the correlation of categorical variables
Correlation analysis between categorical variables basic use of chi-square test, the data are frequency, Chi Square inspection of the command and options are as follows
Classification variables are divided into several situations depending on the category:
1. Two variables are multi-classification
This is the usual analysis of the correlation between variables, when the data is generally a data frame or matrix structure of the frequency tables, you can directly use the chisq.test () command for processing, such as:
> Chisq.test (BIRD.DF)
If there is a frequency of 0 in the data tables, an error message is output: chi-squared approximation may be incorrect
2. Two variables are two categories
At this point the data is 2*2, and Chisq.test () uses the Yates continuity correction by default, which can be closed by corrct=true, in effect Chisq.test () will only use 2*2 correction for Yates-linked tables. If the Monte Carlo method is set, the Yates correction is no longer used.
3. One variable is two classification a variable is a multi-class
In this case, the chi-square goodness of fit test, for the sake of insurance, basically set the rescale.p=true, if not specify p, the expected probability is considered to be all equal.
> chisq.test (survey$new,p=survey$old,rescale.p = TRUE)
The correlation analysis of R language