R in Action reading notes (6)-Seventh chapter: Basic statistical analysis (Part I)

Source: Internet
Author: User

7.3 Related

Correlation coefficients can be used to describe the relationship between quantitative variables. The correlation coefficient symbol (±) indicates the direction of the relationship (positive correlation or negative correlation), and the size of its value indicates the degree of strength of the relationship (completely unrelated to 0, and 1 when fully correlated). In addition to the basic installation, we will also use the psych and GGM packages.

7.3.1 Related types

1.Pearson, Spearman and Kendall related

The correlation coefficient of Pearson is a measure of the linear correlation between two quantitative variables. Spearman Grade correlation coefficient scale

The degree of correlation between the ordered variables of the volume classification. Kendall's tau correlation coefficient is also a non-parametric grade-dependent metric.

The Cor () function calculates the three correlation coefficients, and the cov () function can be used to calculate the covariance. The parameters of the two functions are

Many of which parameters related to the calculation of correlation coefficients can be simplified to: cor (x,use=,method=)

X: Matrix or data frame

Use: Specifies how missing data is handled. The optional method is all.obs (assuming there is no missing data--when the missing data is reported

Error), everything (when encountering missing data, the calculation of the correlation coefficients will be set to missing), Complete.obs

(row delete) and pairwise.complete.obs (paired delete, pairwise deletion)

Method: Specifies the type of correlation factor. The optional type is Pearson, Spearman, or Kendall

The first statement calculates the variance and covariance, the second one calculates the correlation coefficient of the Pearson, and the third statement calculates

The Spearman grade correlation coefficient

2. Partial correlation

Partial correlation refers to the relationship between the other two quantitative variables when controlling one or more quantitative variables. You can use

The Pcor () function in the GGM package calculates the partial correlation coefficients, and the function invocation format is: Pcor (u,s)

where u is a numerical vector, the first two values indicate the variable subscript to calculate the correlation coefficients, and the remaining values are the condition variables

(That is, the variable to be excluded from the impact). S is the covariance matrix of the variables.

Significance test of correlation of 7.3.2

You can use the Cor.test () function to test individual Pearson, Spearman, and Kendall correlation coefficients. The simplified use format is: Cor.test (x,y,alternative-,method=)

where x and y are the variables to be tested for correlation, alternative is used to specify either a double-sided test or a single-sided test (value

"Two.side", "less" or "greater"), and method is used to specify the type of correlation to be computed ("Pearson",

"Kendall" or "Spearman"). When the hypothesis of the study is that the correlation coefficient of the population is less than 0 o'clock, use alternative=

"Less". When the hypothesis of the study is that the correlation coefficient of the population is greater than 0 o'clock, alternative= "greater" should be used. By default, the assumption is alternative= "Two.side" (the overall correlation coefficient is not equal to 0)

Cor.test can only test one related relationship at a time. The Corr.test () function provided in the psych package can do more things at once. The Corr.test () function calculates the correlation matrix and the significance level for Pearson, Spearman, or Kendall.

>library (Psych)

>corr.test (states,use= "complete")

The value of the parameter use= can be either "pairwise" or "complete" (indicating that a pair deletion or row deletion is performed on the missing value, respectively)

except). The value of the parameter method= can be "Pearson" (default), "Spearman", or "Kendall".

。 Under the assumption of multivariate normality, the pcor.test () function ① in the psych package can be used to examine the conditional independence between two variables when controlling one or more additional variables. Use format: Pcor.test (r,q,n)

where r is the partial correlation coefficient computed by the Pcor () function, q is the number of variables to be controlled (the position is represented numerically), and N is

Sample size. The R.test () function in the psych package provides a variety of practical significance

Test method. This function can be used to verify:

The significance of some correlation coefficient;

Whether the difference of two independent correlation coefficients is significant;

Two the difference of non-independent correlation coefficients based on a shared variable is significant;

Two the difference between the non-independent correlation coefficients based on completely different variables is significant.

7.4 t test

T test of 7.4.1 Independent samples

An independent sample T-test for two groups can be used to test the assumption that the mean equality of the two population is equal. This assumes that the two sets of data are independent and are drawn from the normal population. The call format for the test is: t.test (y~x,data)

Where y is a numeric variable and x is a binary variable. Call format or: T.test (y1,y2)

The Y1 and y2 are numeric vectors (that is, the result variables of each group). The optional parameter data is taken as a value that contains the

The matrix or data frame of the variable. You can add a parameter alternative= "less" or alternative= "greater" to conduct a directional test.

> T.test (prob~so,data=uscrime)

Welch, Sample t-test

Data:prob by So

t = -3.8954, df = 24.925, P-value = 0.0006506

Alternative hypothesis:true difference in means are not equal to 0

Percent Confidence interval:


Sample estimates:

Mean in group 0 mean in Group 1

0.03851265 0.06371269

7.4.2 T-test of non-independent samples

The T-test of non-independent samples assumes that differences between groups are normally distributed.

T.test (y1,y2,parired=true) where Y1 and Y2 are two non-independent groups of numerical vectors

> Library (MASS)

> sapply (Uscrime[c ("U1", "U2")],function (x) (C (Mean=mean (x), SD=SD (x)))

U1 U2

Mean 95.46809 33.97872

SD 18.02878 8.44545

> With (uscrime,t.test (u1,u2,paired=true))

Paired T-test

DATA:U1 and U2

t = 32.4066, df = P-value, < 2.2e-16

Alternative hypothesis:true difference in means are not equal to 0

Percent Confidence interval:

57.67003 65.30870

Sample estimates:

Mean of the differences


Non-parametric test of the difference between 7.5 groups

7.5. Comparison of 12 groups

If two sets of data are independent, the Wilcoxon rank and test can be used to assess whether the observations are drawn from the same probability distribution

Wilcox.test (y~x,data) where y is a numeric variable, and x is a binary variable. Call format or to:

Wilcox.test (y1,y2) where Y1 and y2 are the result variables for each group. The optional parameter data is evaluated as a matrix or data frame that contains these variables. A two-sided test is performed by default. You can add the parameter exact to perform an exact test, specifying alternative= "less" or alternative= "greater" for a directional test.

Wilcoxon symbol rank test is a non-parametric substitution method for the T test of independent samples. It applies to two components to data and

There is no way to guarantee the context of normality assumptions. The call format is exactly the same as the Mann–whitney U test, but you can also add parameters


> sapply (Uscrime[c ("U1", "U2")],median)

U1 U2

92 34

> With (uscrime,wilcox.test (u1,u2,paired=true))

Wilcoxon signed rank test withcontinuity


DATA:U1 and U2

V = 1128, P-value = 2.464e-09

Alternative hypothesis:true location shift isn't equal to 0

7.5.2 more than two groups of comparisons

If each group is independent, then the Kruskal-wallis test will be a practical method. If the groups are not independent (such as repetitive measurement designs or random block designs), then the Friedman test will be more appropriate. The invocation format for the Kruskal–wallis test is:

Kruskal.test (y~a,data) where y is a numeric result variable, A is a grouping variable (groupingvariable) with two or more levels. (if there are two levels, it is equivalent to the Mann–whitney u test.) The call format for the Friedman test is: Friedman.test (y~a| B,data)

Where y is a numeric result variable, A is a grouping variable, and B is a block variable that identifies a matching observation (blocking


> States<-as.data.frame (Cbind (state.region,state.x77))

> Kruskal.test (illiteracy~state.region,data=states)

Kruskal-wallis Rank sum test

Data:illiteracy by State.region

Kruskal-wallis chi-squared = 22.6723, df = 3,

P-value = 4.726e-05

R in Action reading notes (6)-Seventh chapter: Basic statistical analysis (Part I)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.