The correlation analysis of R language

Source: Internet
Author: User

A relationship between two variables or two sets of variables, called correlations for a continuous variable, is called associativity for categorical variables.

one, the correlation between continuous variables
Common commands and options are as follows


Here's how to use it:
1. Calculate correlation coefficient and correlation coefficient matrix

> Cor (count,speed)
[1] 0.7237206

> Cor (count,speed,method = "Spearman")
[1] 0.5269556

> Cor (MF)
Length Speed Algae NO3 BOD
Length 1.0000000-0.34322968 0.7650757 0.45476093-0.8055507
Speed-0.3432297 1.00000000-0.1134416 0.02257931 0.1983412
Algae 0.7650757-0.11344163 1.0000000 0.37706463-0.8365705
NO3 0.4547609 0.02257931 0.3770646 1.00000000-0.3751308
BOD-0.8055507 0.19834122-0.8365705-0.37513077 1.0000000

> Cor (MF$LENGTH,MF) can display the correlation coefficients of a variable through the $ designation
Length Speed Algae NO3 BOD
[1,] 1-0.3432297 0.7650757 0.4547609-0.8055507

2. Calculating variance and covariance matrices

> CoV (count,speed)
[1] 123

> var (count,speed)
[1] 123

> CoV (MF)
Length Speed Algae NO3 BOD
Length 9.4900000-4.95000000 45.858333 0.70683333-111.55667
Speed-4.9500000 21.91666667-10.333333 0.05333333 41.74167
Algae 45.8583333-10.33333333 378.583333 3.70166667-731.73333
NO3 0.7068333 0.05333333 3.701667 0.25456667-8.50850
BOD-111.5566667 41.74166667-731.733333-8.50850000 2020.87333

> Cov2cor (CoV (MF))
Length Speed Algae NO3 BOD
Length 1.0000000-0.34322968 0.7650757 0.45476093-0.8055507
Speed-0.3432297 1.00000000-0.1134416 0.02257931 0.1983412
Algae 0.7650757-0.11344163 1.0000000 0.37706463-0.8365705
NO3 0.4547609 0.02257931 0.3770646 1.00000000-0.3751308
BOD-0.8055507 0.19834122-0.8365705-0.37513077 1.0000000

3. The significance of correlation coefficient test

> Cor.test (count,speed)

Pearson ' s product-moment correlation

Data:count and Speed
t = 2.5689, df = 6, P-value = 0.0424
Alternative hypothesis:true correlation is not equal to 0
Percent Confidence interval:
0.03887166 0.94596455
Sample estimates:
Cor
0.7237206

4. Using formula syntax
> cor.test (~count+speed,data=fw3,subset = cover%in%c ("Open", "closed"))

Calculates the correlation coefficient test for the open and closed two categories of cover categorical variables in the FW3 data frame.

============================================================

the correlation of categorical variables

Correlation analysis between categorical variables basic use of chi-square test, the data are frequency, Chi Square inspection of the command and options are as follows



Classification variables are divided into several situations depending on the category:

1. Two variables are multi-classification
This is the usual analysis of the correlation between variables, when the data is generally a data frame or matrix structure of the frequency tables, you can directly use the chisq.test () command for processing, such as:

> Chisq.test (BIRD.DF)
If there is a frequency of 0 in the data tables, an error message is output: chi-squared approximation may be incorrect


2. Two variables are two categories

At this point the data is 2*2, and Chisq.test () uses the Yates continuity correction by default, which can be closed by corrct=true, in effect Chisq.test () will only use 2*2 correction for Yates-linked tables. If the Monte Carlo method is set, the Yates correction is no longer used.

3. One variable is two classification a variable is a multi-class

In this case, the chi-square goodness of fit test, for the sake of insurance, basically set the rescale.p=true, if not specify p, the expected probability is considered to be all equal.

> chisq.test (survey$new,p=survey$old,rescale.p = TRUE)

The correlation analysis of R language

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.