7.3 Related
Correlation coefficients can be used to describe the relationship between quantitative variables. The correlation coefficient symbol (±) indicates the direction of the relationship (positive correlation or negative correlation), and the size of its value indicates the degree of strength of the relationship (completely unrelated to 0, and 1 when fully correlated). In addition to the basic installation, we will also use the psych and GGM packages.
7.3.1 Related types
1.Pearson, Spearman and Kendall related
The correlation coefficient of Pearson is a measure of the linear correlation between two quantitative variables. Spearman Grade correlation coefficient scale
The degree of correlation between the ordered variables of the volume classification. Kendall's tau correlation coefficient is also a non-parametric grade-dependent metric.
The Cor () function calculates the three correlation coefficients, and the cov () function can be used to calculate the covariance. The parameters of the two functions are
Many of which parameters related to the calculation of correlation coefficients can be simplified to: cor (x,use=,method=)
X: Matrix or data frame
Use: Specifies how missing data is handled. The optional method is all.obs (assuming there is no missing data--when the missing data is reported
Error), everything (when encountering missing data, the calculation of the correlation coefficients will be set to missing), Complete.obs
(row delete) and pairwise.complete.obs (paired delete, pairwise deletion)
Method: Specifies the type of correlation factor. The optional type is Pearson, Spearman, or Kendall
The first statement calculates the variance and covariance, the second one calculates the correlation coefficient of the Pearson, and the third statement calculates
The Spearman grade correlation coefficient
2. Partial correlation
Partial correlation refers to the relationship between the other two quantitative variables when controlling one or more quantitative variables. You can use
The Pcor () function in the GGM package calculates the partial correlation coefficients, and the function invocation format is: Pcor (u,s)
where u is a numerical vector, the first two values indicate the variable subscript to calculate the correlation coefficients, and the remaining values are the condition variables
(That is, the variable to be excluded from the impact). S is the covariance matrix of the variables.
Significance test of correlation of 7.3.2
You can use the Cor.test () function to test individual Pearson, Spearman, and Kendall correlation coefficients. The simplified use format is: Cor.test (x,y,alternative-,method=)
where x and y are the variables to be tested for correlation, alternative is used to specify either a double-sided test or a single-sided test (value
"Two.side", "less" or "greater"), and method is used to specify the type of correlation to be computed ("Pearson",
"Kendall" or "Spearman"). When the hypothesis of the study is that the correlation coefficient of the population is less than 0 o'clock, use alternative=
"Less". When the hypothesis of the study is that the correlation coefficient of the population is greater than 0 o'clock, alternative= "greater" should be used. By default, the assumption is alternative= "Two.side" (the overall correlation coefficient is not equal to 0)
Cor.test can only test one related relationship at a time. The Corr.test () function provided in the psych package can do more things at once. The Corr.test () function calculates the correlation matrix and the significance level for Pearson, Spearman, or Kendall.
>library (Psych)
>corr.test (states,use= "complete")
The value of the parameter use= can be either "pairwise" or "complete" (indicating that a pair deletion or row deletion is performed on the missing value, respectively)
except). The value of the parameter method= can be "Pearson" (default), "Spearman", or "Kendall".
。 Under the assumption of multivariate normality, the pcor.test () function ① in the psych package can be used to examine the conditional independence between two variables when controlling one or more additional variables. Use format: Pcor.test (r,q,n)
where r is the partial correlation coefficient computed by the Pcor () function, q is the number of variables to be controlled (the position is represented numerically), and N is
Sample size. The R.test () function in the psych package provides a variety of practical significance
Test method. This function can be used to verify:
The significance of some correlation coefficient;
Whether the difference of two independent correlation coefficients is significant;
Two the difference of non-independent correlation coefficients based on a shared variable is significant;
Two the difference between the non-independent correlation coefficients based on completely different variables is significant.
7.4 t test
T test of 7.4.1 Independent samples
An independent sample T-test for two groups can be used to test the assumption that the mean equality of the two population is equal. This assumes that the two sets of data are independent and are drawn from the normal population. The call format for the test is: t.test (y~x,data)
Where y is a numeric variable and x is a binary variable. Call format or: T.test (y1,y2)
The Y1 and y2 are numeric vectors (that is, the result variables of each group). The optional parameter data is taken as a value that contains the
The matrix or data frame of the variable. You can add a parameter alternative= "less" or alternative= "greater" to conduct a directional test.
> T.test (prob~so,data=uscrime)
Welch, Sample t-test
Data:prob by So
t = -3.8954, df = 24.925, P-value = 0.0006506
Alternative hypothesis:true difference in means are not equal to 0
Percent Confidence interval:
-0.03852569-0.01187439
Sample estimates:
Mean in group 0 mean in Group 1
0.03851265 0.06371269
7.4.2 T-test of non-independent samples
The T-test of non-independent samples assumes that differences between groups are normally distributed.
T.test (y1,y2,parired=true) where Y1 and Y2 are two non-independent groups of numerical vectors
> Library (MASS)
> sapply (Uscrime[c ("U1", "U2")],function (x) (C (Mean=mean (x), SD=SD (x)))
U1 U2
Mean 95.46809 33.97872
SD 18.02878 8.44545
> With (uscrime,t.test (u1,u2,paired=true))
Paired T-test
DATA:U1 and U2
t = 32.4066, df = P-value, < 2.2e-16
Alternative hypothesis:true difference in means are not equal to 0
Percent Confidence interval:
57.67003 65.30870
Sample estimates:
Mean of the differences
61.48936
Non-parametric test of the difference between 7.5 groups
7.5. Comparison of 12 groups
If two sets of data are independent, the Wilcoxon rank and test can be used to assess whether the observations are drawn from the same probability distribution
Wilcox.test (y~x,data) where y is a numeric variable, and x is a binary variable. Call format or to:
Wilcox.test (y1,y2) where Y1 and y2 are the result variables for each group. The optional parameter data is evaluated as a matrix or data frame that contains these variables. A two-sided test is performed by default. You can add the parameter exact to perform an exact test, specifying alternative= "less" or alternative= "greater" for a directional test.
Wilcoxon symbol rank test is a non-parametric substitution method for the T test of independent samples. It applies to two components to data and
There is no way to guarantee the context of normality assumptions. The call format is exactly the same as the Mann–whitney U test, but you can also add parameters
Paired=true.
> sapply (Uscrime[c ("U1", "U2")],median)
U1 U2
92 34
> With (uscrime,wilcox.test (u1,u2,paired=true))
Wilcoxon signed rank test withcontinuity
Correction
DATA:U1 and U2
V = 1128, P-value = 2.464e-09
Alternative hypothesis:true location shift isn't equal to 0
7.5.2 more than two groups of comparisons
If each group is independent, then the Kruskal-wallis test will be a practical method. If the groups are not independent (such as repetitive measurement designs or random block designs), then the Friedman test will be more appropriate. The invocation format for the Kruskal–wallis test is:
Kruskal.test (y~a,data) where y is a numeric result variable, A is a grouping variable (groupingvariable) with two or more levels. (if there are two levels, it is equivalent to the Mann–whitney u test.) The call format for the Friedman test is: Friedman.test (y~a| B,data)
Where y is a numeric result variable, A is a grouping variable, and B is a block variable that identifies a matching observation (blocking
Variable).
> States<-as.data.frame (Cbind (state.region,state.x77))
> Kruskal.test (illiteracy~state.region,data=states)
Kruskal-wallis Rank sum test
Data:illiteracy by State.region
Kruskal-wallis chi-squared = 22.6723, df = 3,
P-value = 4.726e-05
R in Action reading notes (6)-Seventh chapter: Basic statistical analysis (Part I)