R Language Combat (v) variance analysis and efficacy analysis

Last Update:2017-01-15 Source: Internet

Author: User

Tags benchmark

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article corresponds to "r language combat" the 9th chapter: Variance analysis; Chapter 10th: Efficacy Analysis

====================================================================

Variance Analysis:

Regression analysis is to predict the quantified response variables by quantifying the Predictor variables, while the explanatory variables contain either nominal or ordered factor variables, the focus of our attention is usually shifted from the prediction to the analysis of the group difference, which is the analysis method is the analysis of variance (ANOVA). When there is more than one dependent variable, it is called multivariate variance analysis (manova). When there are co-variables, it is called covariance analysis (ancova) or multivariate covariance analysis (mancova).

#基本格式aov (formula, data = Dataframe)

Table in the basic expression symbol reference regression

Expressions for research and design

In the following table, lowercase letters denote quantitative variables, uppercase letters represent group factors, and subject are unique to the Participants.

Design	An expression
Single Factor Anova	Y ~ A
Single factor Ancova with single co-variables	Y ~ x + A
Two-factor Anova	Y ~ A * B
Ancova with two Co-variables	Y ~ x1 + x2 + a*b
Randomized Zone Group	Y ~ B + A (b is the zone group Factor)
Single-factor intra-group ANOVA	Y ~ A + Error (subject/a)
Repeated measurements with a single intra-group factor (W) and a single inter-group factor (B) Anova	Y ~ B * W + Error (subject/w)

Order of items in an expression:

There are two situations that can affect: (1) more than one factor, and non-balanced design; (2) there is a covariance. In either case, the variables to the right of the equation are related to each of the other variables, and we cannot clearly classify their effects on the dependent Variable.

For example, for Two-factor anova, the results of Model Y ~ A * b and model Y ~ b * A are different if the number of observations in different processing methods is different

The r default Type 1 (sequential Type) method calculates the Anova Effect. The first model can be written like this: y ~ A + B + a:b

The results of the ANOVA table in R will Evaluate:

The effect of a on Y
The effect of B on Y when controlling a
Interaction between A and B when controlling the main effect of A and B

Order is Important.

When the argument is related to other independent variables or covariance, there is no definite way to evaluate the contribution of the independent variable to the dependent Variable. For example, the two-factor imbalance factor with factor a, b, and dependent variable y is designed with three effects: the main effect of a and b, and the interaction of A and B. Suppose you are modeling data using the following expression:

Y ~ A + B + a:b

There are three types of methods that can decompose the variance that the effect on the right of the equation interprets for Y.

Type 1 (sequential Type)

The effect is adjusted based on the effect that occurs first in an Expression. A does not make adjustments, B according to a adjustment, a:b interaction items are adjusted according to A and B.

Type 2 (layered Type)

Effect is adjusted according to the effect of the same level or LOW. A according to B adjustment, B according to a adjustment, a:b interaction items at the same time according to A and B adjustment

Type 3 (boundary Type)

Each effect is adjusted accordingly according to the other effects of the Model. A according to B and A:b make adjustments, a:b interaction items are adjusted according to A and B.

R default calls the type 1 method, and other software (such as SAS or Spss) calls the Type 3 method by Default.

The more unbalanced the sample size, the greater the effect of the order of the effects on the Result. In general, the more basic effects need to be placed in front of the Expression.

In particular, the first is the covariance, then the main effect, then the Two-factor interaction, then the Three-factor interaction, and so On.

For the main effect, the more basic variables should be placed before the expression, such as the gender to be placed before the processing Method. There is a basic guideline: if the research design is not orthogonal (that is, the factor is related to the covariance), it is important to set the order of the effects carefully.

It is important to note that the ANOVA () function in the car package provides options for the type 2 and type 3 methods, whereas the AoV () function uses the type 1 method. To make the results consistent with the results provided by other software, you can use the Anova () Function.

Single-factor Variance Analysis:

Multiple comparisons:

The #各组均值差异的成对检验TukeyHSD (fit) #glht () function provides a more comprehensive approach for linear models and generalized linear models library (multcomp) Tuk <-glht (fit, linfct = MCP (TRT = " Tukey ") plot (cld (tuk, level = 0.05), col =" Lightgrey ")

Evaluate the hypothesis of the Test:

The hypothesis of single-factor variance analysis: The dependent variable obeys normal distribution, and the variance of each group is Equal.

#Q-q Graph test normality hypothesis Library (car) qqplot (lm (response ~ trt, data = cholesterol), Simulate = TRUE, labels = FALSE) #数据落在95% confidence interval, Description Satisfy the normality hypothesis # variance homogeneity test (bartlett test) bartlett.test (response ~ trt, data = Cholesterol) #p大于0.05 shows no significant difference #方差齐性分析对离群点非常敏感, Need to make up for a outlier detection library (car) outliertest (fit) #若没有离群点, indicating that the above results are more credible

Single-factor covariance Analysis:

#单因素ANCOVAdata (litter, package = "multcomp") Attach (litter) table (dose) aggregate (weight, by = List (dose), fun = Mean) Fit & Lt;-aov (weight ~ gesttime + dose) Summary (fit) #获取调整的组均值 (i.e., The group mean after the covariance effect is Removed) library (effects) effect ("dose", Fit)

Evaluate the hypothesis of the Test:

The hypothesis condition of single-factor covariance analysis: normality and variance.

The hypothesis is the same as that of the Single-factor variance Analysis.

Visualization of Results:

Library (HH) Ancova (weight ~ gesttime + dose, data = Litter)

Two-factor Variance Analysis:

The #table () function observes whether it is a balanced design attach (toothgrowth) table (supp, dose) aggregate (len, by = List (supp, dose), fun = Mean) aggregate (len, by = List (supp, dose), fun = Sd) fit <-aov (len ~ Supp*dose) Summary (fit) #可视化处理interaction. plot (dose, supp, len, type = "b", col = c ("red", "blue"), pch = c (+), main = "Interaction between Dose and Supplement Type") #或者是library (gplots) Plotm EANs (len ~ interaction (supp, dose, Sep = ""), connect = List (c (1, 3, 5), c (2, 4, 6)), col = c ("red", "darkgreen"), main = "I Nteraction Plot with 95% CIs ", xlab =" treatment and Dose combination ") #也可以 (recommended) library (HH) INTERACTION2WT (len ~ Supp * Dose)

Repeated measurement variance analysis: subjects were measured more than once, focusing on repeated measurement variance analysis with one group and one inter-group Factor.

Hybrid model Design Overview:

Because the traditional repetitive measurement variance analysis assumes that the covariance matrix of the factors within any group is spherical, and that the difference in variance between the two levels of the factors in any group is Equal. But in reality this assumption is impossible to satisfy, thus deriving a series of alternative methods:

Using the Lmer () function in the Lme4 package to fit the linear mixed model;

Use the Anova () function in the car package to adjust the traditional test statistics to compensate for the non-satisfaction of the spherical hypothesis (E.G. Geisser-greenhouse correction);

The generalized least squares model of a given variance-covariance structure is fitted using the GLS () function in the Nlme package.

The repeated measurement data is modeled by multivariate variance Analysis.

Multivariate Variance Analysis:

When there are more than one dependent variable (the result variable), they can be analyzed simultaneously using multivariate Anova (manova).

Evaluation hypothesis Test: The hypothesis of multivariate anova, one is multivariate normality, and one is homogeneity of variance-covariance matrix.

The hypothesis of multivariate normality is that the vector of the dependent variable is subjected to a multivariate normal distribution, which can be tested by the Q-q Graph. Variance-covariance matrix homogeneity refers to the same covariance matrix of each group, which can usually be evaluated by the Box's M-test.

finally, You can use the Aq.plot () function in the Mvoutlier package to examine the multivariate Outliers.

Library (mvoutlier) outliers <-aq.plot (y) ouliers

Robust multivariate Variance analysis:

If the multivariate normality or variance-covariance mean assumptions are not satisfied, or if you are concerned about multivariate outliers, consider using a robust or non-parametric version of the Manova Test.

Robust Single-factor Manova can be achieved through the Wilks.test () function in the Rrcov package. The Adonis () function in the vegan package provides an equivalent form of a non-parametric Manova.

#Wilks. Test () Function Application Example library (rrcov) wilks.test (y, shelf, method = "mcd")

To do Anova with regression:

In fact, Anova and regression are special cases of generalized linear models. The LM () function can therefore be used for Analysis.

This part does not understand, look back Later.

=========================================================================

Efficacy analysis:

Efficacy analysis helps to determine the amount of sample required to detect a given effect value in a given confidence level. In turn, it can also help you calculate the probability that a given effect value can be detected in a sample volume at a given level of confidence. If the probability is unacceptably low, modifying or abandoning the experiment would be a wise choice.

This chapter will learn how to perform efficacy analysis on a variety of statistical tests, including proportional testing, t-test, chi-square test, balanced single-factor anova, correlation analysis, and linear model Analysis. Since the efficacy analysis is for hypothesis testing, we will first briefly review the 0 hypothesis significance test (nhst) process and then learn how to use R for efficacy analysis, focusing primarily on PWR packages.

Hypothesis Test Review:

firstly, The total distribution parameters are assumed as null hypothesis H0, and the overall parameters are inferred from the sample calculation by Sampling. Assuming that H0 is true, if the probability of the statistics obtained for the observed sample is very small, the original hypothesis can be rejected and the alternative hypothesis H1 Accepted.

Type Ⅰ error: H0 is true but refuses H0

Ⅱ type error: H0 is false but does not refuse H0

In the research process, researchers usually focus on four volumes:

Sample Size: number of observations in each condition/group in the experimental design

Significance level (i.e., alpha, which is the threshold of probability p): probability of a Ⅰ type error

Efficacy: (1-p (ⅱ type Error)) can be seen as the probability that a real effect occurs.

Effect value: refers to the amount of the effect in the alternative or research hypothesis. The expression of an effect value depends on the statistical method used in the hypothesis Test.

These four quantities are closely related, and given any three of them, the fourth amount can be calculated. This chapter is used to perform a variety of efficacy analyses.

Functions in the PWR package

Function	Objects for efficacy calculations
Pwr.2p.test ()	Two ratios (n Equals)
Pwr.2p2n.test ()	Two ratios (n Unequal)
Pwr.anova.test ()	Single factor Anova of balance
Pwr.chisq.test ()	Chi-Square inspection
Pwr.f2.test ()	Generalized linear model
Pwr.p.test ()	Ratio (single Sample)
Pwr.r.test ()	Correlation coefficient
Pwr.t.test ()	T test (single sample, two samples, paired)
Pwr.t2n.test ()	T test (n unequal two copies)

T test

Pwr.t.test (n =, D =, sig.level =, power =, Alternative =)

n is sample size

Sig.level represents a significant level, the default of 0.05

Power for efficacy level

Type refers to the test types: a two-sample t-test (two.sample), a single-sample test (one.sample), or a dependent-sample t-test (paired) defaults to a two-sample T-test.

Alternative means that the statistical test is either a bilateral test (two.sided) or a single-sided test (less or greater). Default two-sided

Variance analysis

Pwr.anova.test (k =, n =, F =, sig.level =, Power =)

Correlation

Pwr.r.test (n =, R =, sig.level =, power =, Alternative =)

where n is the number of observations, R is the value of the effect (measured by a linear correlation coefficient), sig.level is a significant level, power is the efficacy level, alternative The specified significance test is bilateral test (two.sided) or unilateral test (less or greater)

Linear model

Pwr.f2.test (u =, v =, F2 =, sig.level =, Power =)

When evaluating the impact of a set of predictor variables on the results, it is appropriate to use the first formula to calculate the f2; it is appropriate to use the second formula when evaluating the effect of a set of predictors over the result of a second set of variables (covariance).

Proportional inspection

Pwr.2p.test (h =, n =, sig.level =, power =)

Where h is the effect value, n is the same sample size for each GROUP. The effect value h is defined as Follows:

can be calculated using ES.h (p1, p2) functions

When each group N is different:

Pwr.2p2n.test (h =, N1 =, n2 =, sig.level =, Power =)

similarly, Alternative can set unilateral or bilateral inspection (default)

Chi-Square inspection

Chi-Square tests are often used to evaluate the relationship between two categories of variables: the typical 0 hypothesis is that the variables are independent and the alternative hypothesis is not independent.

Pwr.chisq.test (w =, N =, DF =, sig.level =, Power =)

Where W is the effect value, n is the total sample size, and DF is the degree of freedom.

Here the sum from 1 to M is summed, and the m on the plus sign refers to the number of cells in the column Table. The function es.w2 (p) can calculate the effect value of the optional hypothesis in the Two-factor list, and P is a hypothetical two-factor probability table.

Select the appropriate effect value in the new situation:

In efficacy analysis, The expected effect value is the most difficult parameter to Determine. You are usually required to have a certain understanding of the subject and have the appropriate measurement experience. When there is no experience, a benchmark can be used for reference, the benchmark is proposed by Cohen (1988), can be divided into a variety of statistical tests of small, medium and minor effects of the values:

Statistical methods	Effect measured value	Recommended benchmark for effect values
Statistical methods	Effect measured value	Small	In	Big
T test	D	0.20	0.50	0.80
Variance analysis	F	0.10	0.25	0.40
Linear model	F2	0.02	0.15	0.35
Proportional inspection	H	0.20	0.50	0.80
Chi-Square inspection	W	0.10	0.30	0.50

note, however, that this reference is only a reference and is a general recommendation that may not be applicable in a particular field of study.

Draw the Power analysis Graph: use the For loop to visualize the relationship between the sample amount and the correlation coefficient, and use the graph to determine the required sample Size.

Other specialized functional Analysis packages:

Package	Objective
Asypow	Calculation of efficacy by the asymptotic likelihood ratio method
Pwrgsd	Effect analysis of group sequence design
Pamm	Effect analysis of stochastic effect in mixed model
Powersurvepi	Calculation of efficacy and sample size in survival analysis of epidemiological studies
Powerpkg	Analysis of the efficacy of the paired method and TdT (transmission disequilibrium test, transmission imbalance Test) for diseased compatriots
Powergwasinteraction	Effect calculation of Gwas interaction
Pedantics	Some functions that contribute to the analysis of the effectiveness of population gene research
Gap	The function of calculating the efficiency and sample amount in the design of some cases cohort research
Ssize.fdr	Calculation of sample size in microarray experiments

R Language Combat (v) variance analysis and efficacy analysis

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More