Analysis of variance of R language

Source: Internet
Author: User

One, one-factor variance analysis

Single-factor ANOVA has only one grouping variable, so the data looks like a multicolumn data frame, such as
Grass Heath Arable
1 3 6 19
2 4 7 3
3 3 8 8
4 5 8 8
5 6 9 9
6 12 11 11
7 21 12 12
8 4 11 11
9 5 NA 9
4 Na Na
7 Na Na
8 Na Na

The basic command for variance analysis is AOC (), and it needs to use the formula syntax, and the data structure is also the Predictor + factor, so for the previous data, if the direct use will be an error, we use the stack () command to convert it, the transformed data form as follows:
Values IND
1 3 Grass
2 4 Grass
3 3 Grass
4 5 Grass
5 6 Grass
6 Grass
7 Grass
8 4 Grass
9 5 Grass
4 Grass
7 Grass
8 Grass
6 Heath
7 Heath
8 Heath
8 Heath
9 Heath
Heath
Heath
Heath
NA Heath
NA Heath
Heath NA
NA Heath
Arable
3 arable
8 arable
8 arable
9 arable
Arable
Arable
Arable
9 arable
NA arable
NA arable
Arable NA
The original data has a NA entry, and if you want to remove it, you can use Na.omit ().

After converting the data into the form of Predictor + factor, we use the AOV () command for variance analysis, such as:
> AoV (COUNT~SITE,DATA=BFS)
Call:
AoV (Formula = Count ~ Site, data = BFS)

Terms:
Site residuals
Sum of Squares 55.3678 467.6667
Deg. of Freedom 2 26

Residual standard error:4.24113
Estimated effects may be unbalanced

We use the summary () command for the results and present it as a classic variance analysis table with the following results:
> Summary (AOV (COUNT~SITE,DATA=BFS))
Df Sum Sq Mean sq F value Pr (>f)
Site 2 55.4 27.68 1.539 0.233
Residuals 26 467.7 17.99
It can be seen that the above results include the F value and the significance.

Anova often involves post-mortem testing, and we can use the TUKEYHSD () command to perform Tukey honest tests.

Analysis of variance of multiple factors

Multi-factor ANOVA needs to consider the interaction between factors, more complex than single factor, if there are data as follows:
Height Plant water
1 9 vulgaris Lo
2 vulgaris Lo
3 6 vulgaris Lo
4 vulgaris mid
5 vulgaris mid
6 vulgaris mid
7 vulgaris Hi
8 vulgaris hi
9 vulgaris hi
7 sativa Lo
6 sativa Lo
5 sativa Lo
Sativa mid
Sativa mid
Sativa mid
Sativa hi
Sativa hi
Notoginseng sativa Hi

The AoV () command can be set as follows:
(1) > AoV (height~plant+water,data = PW)
(2) > AoV (height~plant*water,data = PW)
(3) > AoV (height~plant+water+plant:water,data = PW)

(1) Represents the analysis ~ right factor, (2) represents the analysis ~ the right side of the factors and the interaction between them, (3) Represents the analysis ~ right side of the factor and specify two factors interaction, when there are only two factors, (2) and (3) is equivalent.
For these three variance analysis results, we can further use the ANOVA () command to compare them, such as:

> Aov1<-aov (height~plant+water,data = PW)
> Aov2<-aov (height~plant*water,data = PW)
> Aov3<-aov (height~plant+water+plant:water,data = PW)
> Anova (AOV1,AOV2,AOV3)
Analysis of Variance Table

Model 1:height ~ Plant + water
Model 2:height ~ Plant * Water
Model 3:height ~ Plant + water + plant:water
RES.DF RSS Df Sum of Sq F Pr (>f)
1 14 199.111
2 12 69.333 2 129.78 11.231 0.001783 * *
3 12 69.333) 0 0.00
---
Signif. codes:0 ' * * * ' 0.001 ' * * ' 0.01 ' * ' 0.05 '. ' 0.1 "1

It can be seen that AOV1 and aov2 differ significantly, which also shows that the interaction between the factors has an effect on the results, while the AOV2 and aov3 results are identical.

Because of the interaction, there is a lot of results when you do a post-mortem, and we can specify the factors you want to see through the which option of the TUKEYHSD () command, such as:
> tukeyhsd (Aov1,which = "Water")
We only look at 22 comparisons of water factors.


When analyzing interactions, it is often necessary to do interaction diagrams, which can be implemented through the Interaction.plot () command, which is:
Interaction.plot (X.factor,trace.factor,response ...)
X.factor How to divide the interaction function
Trace.factor is the way in which classes are divided into x.factor, which is actually a combination of x.factor and trace.factor to show interaction.
The response is the specified dependent variable.

Three, more complex analysis model of variance

Analysis of variance of R language

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.