Linear models (1)

Source: Internet
Author: User
Tags two factor

In the analysis of variance, we introduce the idea of linear model, in fact, the linear model is only the model of method analysis, and its statistical test is still based on the principle of variance decomposition F test.

Linear model, as a very important mathematical model, can be divided into linear regression model and variance analysis model according to the analysis, and can be divided into general linear model, generalized linear model, general linear mixed model and generalized linear mixed model according to the form of expression.

Below we will introduce the linear model according to the analysis purpose.

One, variance analysis model:

There are some basic concepts involved when using a linear model for variance analysis:

===============================================

(1) Factors and levels
Factor is also called factor, in the actual analysis, the factor is the variable that will affect the result, usually the factor is the categorical variable, if the argument is explained by the independent variable and the dependent variable, then the factor is the independent variable, the result is the dependent variable.

A factor below often has different indicators, called levels, performance in categorical variables are different categories or range of values, such as gender factors have male, female two levels, sometimes the value range is artificially divided.

(2) Unit
The combination of factors, expressed in the list is a cell, some experimental design, such as the Latin side design, the cell is empty or none.

(3) elements
refers to the smallest unit used to measure the value of the dependent variable, which is actually the measured value. Depending on the design of the experiment, there can be one or more elements in one cell of the list, or there may be no elements.

(4) Equalization
If one of the factors in an experiment design has the same number of occurrences in all cells, and the number of elements in each cell is the same, then the experiment is balanced. The unbalanced experiment design is complicated in the analysis, and it is necessary to make a special setting for the analysis model of the other side.

(5) Co-variable
Sometimes, when we analyze the effects of certain factors, we need to rule out the effect of a factor on the dependent variable, which is called the covariance,

(6) Interaction
If one factor's effect size is significantly different at different levels of another factor, then there is an interaction between the two factors. Interaction is a multi-factor analysis must be done, so that the results of the analysis will be comprehensive.

(7) Fixed factors and stochastic factors
Is the two kinds of factors, the fixed factor refers to all levels of the factor, in this analysis all appear, from the analysis results can be informed of the full level of the situation. Instead of random factors, all levels of this factor are not fully present in this analysis, and if this analysis is repeated, the level of factors that may be obtained is completely different. Such factors are called stochastic factors.

There is no strict difference between the fixed factor and the stochastic factor, but the need to specify according to the purpose of the analysis, a factor may be a fixed factor may also be a random factor, if a factor is designated as a fixed factor, then the conclusion should not be "generalized" to the full level, otherwise, it should be designated as a random factor. The fixed factor and the random factor treatment method is different, obviously, if the random factor is mistaken as the fixed factor to deal with, the result certainly also will be wrong.
====================================================
The applicable conditions of variance analysis:

(1) Independence:
Requires that the elements in the sample are independent from each other, there is no correlation between them, from a real random sampling, only in order to ensure that the difference can be decomposed, but for the experimental design of repeated measurements, since the measurement data comes from the same individual, there is a correlation between the elements, which requires the use of a special repetitive measurement variance analysis model.

(2) Normality
Because the random error items of each group are set to the mean value of 0, the standard deviation is a normal distribution of a certain value, so the model requires that the residuals of each cell be subject to normal distribution.

(3) Variance homogeneity
Because the random error items of each group are set to obey the normal distribution, the model requires each cell to satisfy the variance, that is the same degree of variation, so as to be comparable.

(4) The relationship between the group covariance and the dependent variables is linear
This is the assumption required in the covariance analysis

(5) The slope of each grouping regression is equal
This is the assumption required in the covariance analysis
==================================================
The variance analysis is divided into single-factor anova, two-factor variance analysis, multivariate anova, and so on, according to the number of processing factors (also known as the Independent variable).

The variance analysis is divided into one-element Anova (ANOVOA) and multivariate Anova (MANOVOA) According to the number of analysis indicators (also known as the dependent variable).

The variance analysis of multi-variable multi-dependent variables can also be referred to as the multivariate variance analysis, which is more precisely called "X-Factor Y-variance analysis", such as two-factor two-element variance analysis.

====================================================

1. Single-Factor variance analysis

Single-factor variance analysis refers to the case where only one of the processing factors affects the result, or only an independent variable in the affected dependent variable,

Single-factor Anova is relatively simple, and we have introduced the analysis of variance in detail. Here, just review:

Any test results can be expressed in the following form:

Yi=μ+εi

Yi is the actual results of the first I experiment, μ is the best estimate of the results, in fact, is the overall mean, εi is the mean and the actual result of the deviation is the random error, in order to facilitate the derivation, we assume that the εi obey the mean value of 0, the standard deviation is a fixed value of the normal distribution, This is also one of the applicable conditions of variance analysis mentioned earlier.

We generalize the above forms according to the variance analysis, assuming that we want to study the differences between several levels, to extract a certain sample from each level and collect the relevant data, then the model formula can be expressed as:

Yij=μi+εij

Where Yij is the actual result of the first J sample of the Group I level, μi is the mean of Group I, and Εij is the deviation of the first J of the first group from the actual result. We also assume that the εi obey the mean value of 0, the standard deviation is a fixed value of the normal distribution, if there is no difference in the level of the I group, then the Yij should be equal to the population mean plus the random error term. To facilitate statistical inference, we have changed the model formula to the following form:

Yij=μ+αi+εij

Where μ means that the overall mean is not considered when grouping, like represents the additional effect of group I, that is, when the mean value of group I is changed, for example

Αi=10, that the mean value of group I is 10 more than the overall mean, if the mean value of the I group is not different, then α1=α2=α3=.....=αi, and vice versa, then we can establish the hypothesis:

H0:i any value, αi=0
H1:i at least one αi<>0 when arbitrary values are taken

Based on the variance analysis of differential decomposition, we find that like is actually the difference caused by processing factors.

2. Two factors and multi-factor variance analysis

When the processing factor is more than 1, we not only have to consider the influence of some factors, but also consider the interaction between several factors, so the model formula also need to expand, take two factor variance as an example, the model formula is:

Yij=μ+αi+βj+γij+εijk

Where μ means that the overall mean is not considered when grouping,
Like represents the additional effect of Group I
Βj indicates additional effects of group J
Γij represents the effect of the interaction of two factors

If we want to analyze the influence of like on mean value, we need to establish hypothesis with like, that is

H0:i any value, αi=0
H1:i at least one αi≠0 when arbitrary values are taken

If we want to analyze the influence of βj on mean value, we need to establish hypothesis with βj, that is
H0:i any value, βj=0
H1:i at least one βj≠0 when arbitrary values are taken

3. Covariance analysis

Sometimes when we are doing variance analysis, some factors will affect the experimental results, but these factors in the experimental design stage can not be avoided, only in the analysis stage to control, this need to control the factors called the covariance, need to analyze the covariance analysis with covariance is called covariance analysis.

The basic idea of covariance analysis is: Before long group mean comparison, we use straight line regression method to find out the quantity relation between each group mean value and covariance, and calculate the mean value when each group covariance is equal, that is, revise mean value, then use the variance analysis to compare the difference between the corrected mean, so as to achieve the goal of excluding the influence of the covariance

In addition to the basic independence, normality and homogeneity, the applicable conditions of covariance analysis have been increased by two points:

(1) The relationship between the group covariance and the dependent variables is linear
(2) The slope of each grouping regression is equal

As we can see from the above, covariance analysis requires linear regression for analysis.

4. Multivariate Variance analysis (MANOVOA)

Multivariate Anova, where multivariate refers to multiple dependent variables, multiple dependent variable variance analysis can not be easily split into multiple single-factor variables, for this type of multivariate data analysis, there are generally two methods: one is factor analysis, the other is multivariate analysis of variance.

One-factor ANOVA cannot analyze the influence of factors on covariance of multiple dependent variables. When considering multiple dependent variables, multivariate variance analysis considers multiple dependent variables as a whole (joint distribution), and from any linear combination of dependent variables, discovers the maximal group difference between different populations, that is, the influence of the independent variable on the whole of multiple dependent variables.

Multivariate variance analysis is also based on the idea of mutation decomposition, but unlike the one-dimensional variance analysis, the univariate Anova is a comparison between the mean square and the intra-group, and the multivariate Anova is compared between the covariance covariance matrix and the intra-group variance covariance matrix. In other words: univariate variance analysis is the decomposition of each other's difference (squared sum), and multivariate variance analysis is the decomposition of each other's difference-covariance (squared sum-to-mean difference product and).

Multivariate Anova also has some applicable conditions, which are similar in general to one-dimensional anova, but some are slightly different

(1) The joint distribution of each dependent variable obeys the multivariate normal distribution. For this, the requirements are not high, in fact, can be approximated to each dependent variable is normally normal distribution, when each dependent variable is subjected to a multivariate normal distribution, each dependent variable is also bound to obey the normal distribution, but as long as there is a dependent variable is not subject to normal distribution, then the joint distribution of these dependent variables must not obey the multivariate distribution.
(2) Mutual independence between the objects of observation
(3) The variance covariance matrix of the observed objects in each group is equal, that is, the variance homogeneity requirement
(4) There is a certain correlation between the dependent variables, which can be judged from the perspective of professional or research purposes.

Above four points, for the 3rd variance of the homogeneity requirements, and the sample size also has a certain requirements, not only the total sample size, but also the size of the sample in each cell should be larger.

In multivariate variance analysis, if the number of independent variables is more than two, the interaction between the independent variables can be further analyzed, which is the same as the univariate variance analysis.

If you also want to analyze which dependent variables are affected or affected by the processing factors, you can process each dependent variable by a single-factor variance analysis. Also, when a processing factor is statistically significant, it is possible to do 22 more analysis of which of the different levels of the dependent variable is statistically significant, which is the same as the single-factor variance analysis.

Multivariate variance analysis has some of its own statistics

(1) SSCP: The sum of squared deviations and the dispersion and matrix
(2) w= the sum of the dispersion matrices of each dependent variable (intra-group variation of the multivariate variance)
(3) t= Total deviation Matrix
(4) The difference matrix between b=t-w= groups

There are several methods for testing multivariate variance:

(1) Roy Test: Roy examines the largest feature root based on HE-1.

(2) Lawley and Hotelling ' s trace test: T=trace (BW-1)

(3) Pillai ' s trace test: v=trace[b (B+W)-1]

(4) Roy's second Test: Roy's other relies on u=| Statistics of B (B+W) -1|

(5) Wilks likelihood ratio test: by Wilks basis λ=| w|/| B+w| exported Statistics

In the above test:
<1> when four test results are different, you need to find out why
<2> when four test results are used, the Wilks likelihood ratio test is recommended, and the Wilks likelihood ratio test is usually the best.
<3>wilks likelihood ratio test, Lawley and Hotelling's trace test, Pillai's trace test are approximate, while Roy's test is effective only when dealing with very large differences, and the rest is less effective than the first three methods.
<4> the Pillai's trace test is most robust when the precondition of the model establishment is not satisfied (such as slight deviation from multivariate normal).

Linear models (1)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.