The 12th chapter: re-sampling and self-help method
In this chapter, we will explore two widely used statistical methods based on randomized thinking: displacement testing and self-help method
12.1 Replacement Test
Displacement test, also known as randomized test or re-randomized test.
There are two types of experiments where 10 subjects have been randomly assigned to one of the conditions (a or B), and the corresponding result variable (score) has also been recorded. The experimental results are as follows:
If the two treatments are really equivalent, then the label assigned to the observation score (A or b processing) is arbitrary. To verify the differences between the two treatments, we can follow these steps:
(1) Similar to the parameter method, the T statistic of the observed data is calculated, called T0;
(2) Put 10 points in a group;
(3) Randomly assign five scores to a processing, and assign five scores to B processing;
(4) Calculate and record the new observed t statistics;
(5) Repeat (3) ~ (4) steps for each possible random allocation, where there are 252 possible allocation combinations;
(6) The 252 T statistics are arranged in ascending order, which is the empirical distribution based on (or on the condition of) sample data;
(7) If the T0 falls on the outside of the 95% part of the empirical distribution, the two processing group is rejected at the 0.05 significance level.
0 hypothesis that the overall mean is equal.
12.2 Replacement test with coin bag
For the independence issue, the coin package provides a general framework for displacement testing. Through the package you can answer
The following questions:
? is the response value independent of the group assignment?
? Two numeric variables independent?
? Two categories variable independent?
The coin function for optional displacement testing is provided in relation to the traditional test:
Inspection |
Coin function |
Two-sample and K-Sample replacement tests |
Oneway_test (y ~ A) |
Two-sample and K-sample substitution tests with one stratified (Zone group) factor |
Oneway_test (y ~ A | C |
Wilcoxon-mann-whitney Rank and test |
Wilcox_test (y ~ A) |
Kruskal-wallis Inspection |
Kruskal_test (y ~ A) |
Person Chi-square test |
Chisq_test (A ~ B) |
Cochran-mantel-haenszel Inspection |
Cmh_test (A ~ B | C |
Linear correlation Test |
Lbl_test (D ~ E) |
Spearman Inspection |
Spearman_test (y ~ x) |
Friedman Inspection |
Friedman_test (y ~ A | C |
Wilcoxon symbol rank test |
Wilcoxsign_test (y1 ~ y2) |
In the coin function, y and x are numeric variables, A and B are categorical factors, C is a category-type Zone group variable, D and e are ordered, and Y1 and Y2 are matched
numeric variables.
Functional form: function (formula,data,distribution=)
which
? Formula describes the relationship between variables to be tested. Examples can be found in table 12-2;
? data is a database frame;
The distribution specifies that the experience is distributed in the form of 0 hypothetical conditions, with possible values of exact,asymptotic and
Approximate. If distribution = "Exact", then the calculation of the distribution is accurate (i.e. according to all possible permutations) under 0 assumptions. Of course, approximate calculations can also be made based on its asymptotic distribution (distribution = "asymptotic") or Monte Carlo resampling (distribution = "Approxiamate (B = #)"), where # refers to the number of repetitions required. Distribution = "Exact" is currently available only for two sample problems.
12.2.1 independent two-sample and K-sample test
T-Test and single-factor substitution test in virtual data:
> library (coin) > Score<-c (40,57,45,55,58,57,64,55,62,65) >treatment<-factor (C (Rep ("A", 5), Rep ("B", 5 )) > Mydata<-data.frame (treatment,score) > T.test (score~treatment,data=mydata,var.equal=true) Samplet-testdata:score Bytreatmentt = -2.345, df = 8, P-value = 0.04705alternative hypothesis:true difference in means I s notequal to 095 percent confidence interval:-19.0405455-0.1594545sample Estimates:mean in group A mean in group B51.0 6 0.6>oneway_test (score~treatment,data=mydata,distribute= "exact") Asymptotic2-sample Permutation Testdata:score By treatment (A, B) Z = -1.9147, p-value = 0.05553alternative hypothesis:true mu are not equal to 0wilcoxon-mann-whitney u test > Library (MASS) > Uscrime<-transform (Uscrime,so=factor (SO)) >wilcox_test (Prob~so,data=uscrime, distribute= "exact") asymptotic Wilcoxonmann-whitney Rank Sum testdata:prob by SO (0, 1) Z = -3.7493, P-value = 0.0001774alte Rnative hypothesis:true mu isn't equal to 0
Approximate K-Sample replacement Test
> Library (Multcomp) > Set.seed (1234) > oneway_test (response~trt,data=cholesterol,+ distribution=approximate (b=9999)) Approximativek-sample permutation testdata:response bytrt (1time, 2times, 4times, Drugd, druge) Maxt = 4.7623, P-value < ; 2.2e-16
12.2.2 independence in the list of tables
With the Chisq_test () or cmh_test () function, we can determine the independence of the two category variables using the displacement test.
The latter function is required when the data can be layered against a third category variable. If the variables are ordered, you can use the
Lbl_test () function to verify that there is a linear trend.
> Library (COIN)
> Library (VCD)
Load the required thread bundle: Grid
> Arthritis<-transform (Arthritis,
+ Improved=as.factor (as.numeric (improved)))
> set.seed (1234)
> Chisq_test (treatment~improved,data=arthritis,distribution=approximate (b=9999))
Approximativepearson ' s chi-squared Test
Data:treatment byimproved (1, 2, 3)
chi-squared = 13.055, P-value = 0.0018
It is necessary to change the variable improved from an ordered factor to a classification factor because, if the order factor is used, coin ()
A linear and linear trend test will be generated instead of a chi-square test.
12.2.3 Independence between numerical variables
The Spearman_test () function provides independent substitution testing of two numeric variables. > States<-as.data.frame (state.x77) > Set.seed (1234) >spearman_test (Illiteracy~murder,data=states, Distribution=approximate (b=9999)) Approximativespearman Correlation testdata:illiteracyby Murderz = 4.7065, P-value < 2.2e-16alternative hypothesis:true Mu is not equal to 0# independence assumptions are not met.
12.2.42 Sample and K sample correlation test
Sample-related tests can be useful when observations in different groups have been allocated properly or when repeated measurements have been used.
For permutation tests of two paired groups, use the Wilcoxsign_test () function, or use friedman_ when more than two groups are available.
The test () function.
> library (Coin) > Library (MASS) >wilcoxsign_test (u1~u2,data=uscrime,distribution= "exact") Exactwilcoxon-signed-rank testdata:y by X (Neg,pos) stratified by Blockz = 5.9691, P-value = 1.421e-14alternative hypothes Is:true mu isn't equal to 0# the result shows that the unemployment rate is different.
R in Action reading notes (16) 12th chapter re-sampling and self-help method replacement test