Tenth Chapter Variance Analysis _ Variance Analysis

Source: Internet
Author: User
Tags square root

In terms of form, variance analysis is to compare the equality of the average of multiple populations, but in essence it is the relationship between variables. Variance analysis is one of the main methods to study the relationship between one (or more) types of variables and a numerical dependent variable. Introduction to 1 Variance analysis

With the increase in the number of individual significant tests, the likelihood of the difference may also increase (not the mean is really different). and variance analysis is to consider all samples at the same time, eliminate the probability of error accumulation, so as to avoid rejecting a real original hypothesis. 1 variance analysis and relevant terminology

ANOVA (Analysis Ofvariance,anova): By examining whether the average value of each population is equal to determine whether the classification of independent variables has a significant effect on the numerical dependent variables.
Examples: To analyze whether there are significant differences in the quality of service between the four industries, that is, whether the "industry" has a significant impact on the "number of complaints".
The above issues can be converted to: Test Four industries are the number of complaints are equal to the average.
In variance analysis, the object to be examined is called a factor or factor. The different manifestations of factors are called levels or processes. The sample data obtained at each factor level is called the observed value.
In the above example, the industry is to test the object, called factors or factors, the retail industry, tourism and other industries, such as the specific performance, called level or treatment, in each industry, the sample data (the number of complaints) is called observation value. Due to only one factor involved in the industry, it is called single factor 4 level test. 2 basic idea and principle of variance analysis

In order to analyze the effect of categorical variables on numerical dependent variables, it is necessary to analyze the data error sources.
(1) Graphic description
(2) Error decomposition
Thought: By analyzing the source of data error, we can judge whether the mean value of different population is equal, and then analyze whether the independent variable has significant influence on the dependent variable.
In-group error: The difference in the observed value reflects the discrete degree of the internal data of a sample. contains only random errors.
Inter-group error: The difference between different populations reflects the degree of dispersion between different samples. Is the sum of the random error and the system error.
Sum of squares: the sum of squares reflecting the size of all data errors, reflecting the discrete state of all observed values.
Total squared sum (SST) = Group squared sum (SSE) + Group squared sum (SSA)
The sum of squares in the group and also the sum of squares of errors or residuals
Sum of squares between groups and also called factors
(3) Error analysis
For example, if the same industry does not affect the number of complaints, then in the group error only contains the random error, but does not have the group internal error, then the group error and the group error after the average value will be close to 1:1, conversely, the ratio between the group error and the group error is greater than 1, when the ratio reaches a certain degree, There are significant differences between the different levels of the factors. 3 basic assumptions in analysis of variance

Three basic assumptions in variance analysis:
(1) Each population should obey the normal distribution. Example: The number of complaints per industry must be subject to normal distribution.
(2) The variance σ² of each population must be the same. The observed values in each group are extracted from the normal population with the same variance.
(3) The observed value is independent. 4 general formulation of the question

The setting factor has k level, each level mean value respectively uses μ1, μ2,..., μk, to examine whether K level (overall) the mean value is equal: need to make the following assumption:
H 0: μ1 =μ2 =...=μk Independent variable has no significant effect on the dependent variable
H 1: μ1, μ2,..., μk inequality independent variable has significant influence on dependent variables 2 single factor variance analysis

According to the number of classified independent variables analyzed, variance analysis can be divided into single factor variance analysis and double factor variance analysis. When the variance analysis involves only one type of independent variable, it is called single factor variance analysis. 1 Analysis steps

1 Making assumptions
2 statistical quantities of structural tests
(1) Calculating the mean value of each sample
So that n i is the sample amount of the total I, X ij the first J observations of the total, x I¯ is the sample mean of the I population
x i¯=∑n i j=1 x ij n i, i=1,2,..., K
(2) Calculating the total mean value of all observed values
The total mean value is X¯
X¯=∑k i=1∑n i j=1 x ij n
(3) Calculate the sum of each error squared
Sum of squares: sst=∑k i=1∑n i j=1 (x Ij−x¯) 2
Sum of squares between groups: Ssa=∑k i=1 n i (x I¯−x¯) 2
Sum of squares in group: sse=∑k i=1∑n I j=1 (x ij−x I¯) 2
(4) Calculate the statistic quantity
Because the size of each error squared is related to the number of observations, to eliminate the effect of the observed value on the error squared and size, it is necessary to carry out the average, that is, to divide the corresponding degrees of freedom, the result is called the mean square.
The degree of freedom of the total squared sum is n−1, where n is the total number of observations
The degree of freedom of square sum between groups is k−1, where K is the number of factor level
The degree of freedom of the sum of squares in a group is n−k
According to the idea of variance analysis, the difference between the mean square and the group in the group was compared.
Inter-group mean square: msa=ssak−1
In the group of Mean square: mse=ssen−k
When H 0 is true, the ratio of the two is subject to f distribution, i.e.
F=msamse∼f (k−1,n−k)
3 Statistical decisions
The value F of the fα is compared with the critical value of the given significant level α, thus making a decision on the original hypothesis H 0.
If F>fα, then reject the original hypothesis H 0, that is, the difference is significant.
If F<fα, the original hypothesis H 0 is not rejected and there is no evidence of significant differences between the mean values. 2 measurement of the relationship strength

The degree of prominence between two variables can be reflected by the proportional size of the group squared sum (SSA) to the total squared sum (SST), that is, R 2, i.e.
R 2 =ssasst
The square root can be used to measure the relationship strength between two variables. Multiple comparisons in 3 variance analysis

If based on the above judgment, different levels (the number of complaints from different industries) are not exactly the same, but exactly which mean is not equal. The differences appear between which samples. Further analysis is needed, and the method used is a multiple comparison method  , by comparing the total mean value to further test the difference between the mean values.
There are multiple comparison methods, the least significant difference method is described below, and the abbreviation is LSD (least significant difference). The steps are:
(1) Making assumptions: h 0 :μ i =μ j ,h 1 :μ i ≠μ j   
(2) calculated test statistics: x i  ¯ −x j  ¯  
(3) Calculate LSD, the formula is
lsd=t& Nbsp;α/2 mse (1n i  +1n j  )  − − − − − −  − − − − − − −  √  
t  Distribution of degrees of freedom is n−k , n i    and n j   respectively are the sample sizes of the i  and j  samples, and MSE is the variance within the group. The
(4) makes decisions based on the level of significance α .
If |x i  ¯ −x j  ¯ |>lsd , reject h 0  ; if |x  i  ¯ −x j  ¯ |<LSD , then do not reject h 0   3 double factor Variance analysis

When the variance analysis involves two types of independent variables, it is called two-factor variance analysis.
Example: Analyze the influence of brand and sales area on TV sales. Whether the analysis is a factor at work, or whether both factors work, or both factors do not work.
In the analysis of the two-factor variance, there are two influencing factors, if two factors are independent of each other and affect the dependent variables respectively, they are called the two-factor analysis without interaction; if apart from the individual effects, the collocation of the two factors will have a new effect on the variable (e.g. a certain preference for a particular brand in a particular region), It is called a two-factor analysis with interactive effect. 1 two-factor variance analysis without interaction

Two factors as a row factor, one as a column factor, set the line factor has k level, the column factor has r level, each observation value is x IJ (i=1,2,..., k;j=1,2,..., R).
The x I¯ represents the average value of the observed values under the first level of the line factor, and the average value of the observed values under the level J of the J¯. X¯ represents the total mean of all KR sample data
1 Analysis steps
(1) Making assumptions
Make assumptions about line factors:
H 0: μ1 =μ2 =...=μk factors have no significant effect on the dependent variables.
H 1: μ1, μ2,..., μk factors have significant effect on the dependent variables.
Make assumptions about the factors of the column:
H 0: μ1 =μ2 =...=μr factors have no significant effect on the dependent variables
H 1: μ1, μ2,..., μr, the factors of inequality have significant influence on the dependent variables.
(2) Structural test statistics
As with single factor analysis, we start with the decomposition of total squared sum:
∑k i=1∑r j=1 (x Ij−x¯) 2 =∑k i=1∑r j=1 (x I.−x¯) 2 +∑k i=1∑r j=1 (x. J−x¯) 2 +∑k i=1∑r j=1 (x ij−x I  .  ¯−x J. ¯+x¯) 2
Error squared and (SST) = row error squared (SSR) + column error squared (SSC) + random error squared sum (SSE)
The degrees of freedom from left to right are kr−1, k−1, r−1, (k−1) (r−1), respectively.
Calculate mean square, construct statistic:
F R =msrmse∼f (k−1, (k−1) (r−1))
F C =mscmse∼f (r−1, (k−1) (r−1))
(3) Statistical decision
The query obtains the corresponding critical value Fα, compares the fα with F R, F C's size.
If F R >fα, reject the original hypothesis, the difference is significant.
If F C >fα, reject the original hypothesis, the difference is significant.
2 measurement of the relationship strength
R 2 = Total effect of combined effect =SSR+SSCSST
The square root R reflects the strength of the relationship between the two independent variables and the dependent variables. 2 Two-factor analysis with interactive effect

X Ijl represents the row factor level I and the factors of the first J level of the observation value of the L line; x Ij¯ represents the row factor I level and the factor of the group J Horizontal combination of the sample mean value, X¯¯ for all observed values

The

is essentially the same as an interaction-free factor analysis, but the error is calculated differently:
Total squared and sst:∑ k i=1 ∑ r j=1 ∑ m l=1  (x ijl −x ¯ )  2  
Row variable squared and ssr:rm∑ k i=1  (x i.   ¯ −x ¯  ¯ )  2  
column variable squared and ssc:km∑ r j=1  (X&NBSP;.J   ¯ −x ¯  ¯ )  2  
Interaction squared and Ssrc:m∑ k i=1 ∑  r j=1  (x ij  ¯ −x i.  ¯ −x j.  ¯ + x ¯  ¯ )  2  
Error squared sum: sse=sst−ssr−ssc−ssrc 
Construct statistics One more than no interaction factor:
F  rc =msrcmse  

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.