With the use of SPSS children's shoes are known, we commonly used variance analysis (ANOVA) in the general linear model (generic Linear models, called GLM) under the menu. And who is that GLM? Let's open the Magnum wiki and type the general Linear Model ... What I saw was a fitting Plot with no vainly disobey:
and the legendary multivariate (linear) regression formula: $Y _{i}=\beta_{0} + \beta_{i1}x_{i1} + \beta_{2}x_{i2} + ... + \beta_{p}x_{ip} + \epsilon_{i} $
Isn't that a regression problem? What is the relationship between the variance analysis of the test difference?
In fact, the parameter test based on the normal distribution hypothesis (t test, Anova,manova,ancova, etc.) can be characterized as regression problem.
Let's take the simplest of the two types of differences (the difference between Group A and Group B), we usually use the T-Test to investigate the difference between A and B, and use Error_bar to represent the difference between A and B (for example, left).
A and B are two levels under the argument x, and we can encode them with 0 (A) and 1 (B). Then we can get the corresponding function y=f (x) of the dependent variable Y and the argument x, assuming there is a linear correlation between the two, that is, the function model $y = \alpha X + \beta$ (e.g. right). The greater the difference between A and B, the greater the slope α of the fitted line, i.e. the difference test can be expressed in the form of regression.
So is the significance of the two equivalent? That is, the t fraction between A and B and the slope of the fitted line is the same as the t fraction of the curve α, the answer is yes.
We might as well assume that the variance homogeneity between Group A and group B varies between Group A (Na) and the number of samples in Group B (Nb). Then the T value between Group A and Group B is
$ $t =\frac{\overline{y_{b}}-\overline{y_{a}}}{s_{y_{a}y_{b}} \sqrt{\frac{1}{n_{a}} + \frac{1}{N_{b}}}} $$
Among them, there are $s_{y_{a}y_{b}}=\sqrt{\frac{(n_{a}-1) s_{y_{a}}^2 + (n_{b}-1) s_{y_{b}}^2}{n_{a}+n_{b}-2}}$, $S _{y_{a}}$, $S _{y_ {B}} $ is the standard deviation for Group A and Group B, respectively.
Make $mean=\overline{y_{b}}-\overline{y_{a}}$, $SE =s_{y_{a}y_{b}} \sqrt{\frac{1}{n_{a}} + \frac{1}{N_{b}}}$, which is $t=\ frac{mean}{se}$
The simplest part of the mean is processed first:
The linear model $y = \alpha X + \beta$ into MEAN, due to $x_{a_{i}}=0$, $X _{b_{i}}=1$, that is: $MEAN = (\hat{\alpha} \times 1 +\hat{\beta})-(\hat {\alpha} \times 0 +\hat{\beta}) ={\hat{\alpha}-0} $
Then look at the SE section, first the $s_{y_{a}y_{b}}$ section:
Bessel correction formula based on sample standard deviation, $S _{y_{a}}=\frac{\sum_{i=1}^{n_{a}}{(Y_{a_{i}}-\overline{y_{a}) ^{2}}}{n_{a}-1}$, $S _{y_{b}} =\frac{\sum_{i=1}^{n_{b}}{(Y_{b_{i}}-\overline{y_{b}) ^{2}}}{n_{b}-1}$, bring both into the $s_{y_{a}y_{b}}$:
$ $S _{y_{a}y_{b}}=\sqrt{\frac{\sum_{i=1}^{n_{a}}{(Y_{a_{i}}-\overline{y_{a}) ^{2} + \sum_{i=1}^{n_{b}}{(Y_{b_{i} }-\overline{y_{b}}) ^{2}}}{n_{a}+n_{b}-2}}$$
The mean values of Group A and a are the least squares estimators of the points within their groups, i.e. $\overline{y_{a}}=\hat{y_{a_{i}}},i\in a$;$\overline{y_{b}}=\hat{y_{b_{i}}},i\in B$:
$ $S _{y_{a}y_{b}}=\sqrt{\frac{\sum_{i=1}^{n_{a}}{(Y_{a_{i}}-\hat{y_{a_{i}}) ^{2} + \sum_{i=1}^{n_{b}}{(Y_{b_{i}} -\hat{y_{b_{i}}) ^{2}}}{n_{a}+n_{b}-2}}=\sqrt{\frac{\sum_{i=1}^{n}{(Y_{i}-\hat{y_{i}}) ^{2}}}{N-2}}$$
Finally, see $\sqrt{\frac{1}{n_{a}} + \frac{1}{n_{b}}}$:
$$\frac{1}{n_{a}} + \frac{1}{n_{b}}=\frac{1}{\frac{n_{a}n_{b}}{n_{a}+n_{b}}}$$
$$=\frac{1}{\frac{n_{b} (N-n_{b})}{n}}$$
$$=\FRAC{1}{N_{B}-\frac{n_{b}^2}{n}}$$
$$=\FRAC{1}{N_{B}-2 \frac{n_{b}^2}{n}+\frac{n_{b}^2}{n}}$$
Since $x_{a_{i}}=0$, $X _{b_{i}}=1$, there are $n_{b}=\sum^{n}x_{i}=\sum^{n}x_{i}^2$, and there are $\overline{x}=\frac{n_{b}}{n}$:
$ $N _{b}-2 \frac{n_{b}^2}{n}+\frac{n_{b}^2}{n}=\sum^{n}x_{i}^2-2 \sum^{n} (\frac{n_{b}}{n} X_{i}) +\sum^{n} (\frac{n_{b}}{n}) ^2$$
$$=\sum^{n} (X_{i}-\frac{n_{b}}{n}) ^2=\sum^{n} (x_{i}-\hat{x}) ^2$$
i.e. $\large \sqrt{\frac{1}{n_{a}} + \frac{1}{n_{b}}}=\frac{1}{\sqrt{\sum_{i=1}^{n} (x_{i}-\overline{x}) ^2}}$
All in all, $\large t=\frac{mean}{se}=\frac{mean}{s_{y_{a}y_{b}} \sqrt{\frac{1}{n_{a}} + \frac{1}{n_{b}}}}=\frac{\hat{\ Alpha} -0}{\sqrt{\frac{\frac{1}{n-2} \sum_{i=1}^{n}{(Y_{i}-\hat{y_{i}}) ^{2}}}{\sum_{i=1}^{n} (X_{i}-\overline{X} ^2}}}=\frac{\hat{\alpha} -0}{se_{\hat{\alpha}}}$, the final equation is the expression of the T-test of whether the least squares estimator of the slope of the linear model $\hat{\alpha}$ is greater than 0.
At this point, we have succeeded in proving that:
The T value of the difference between Group A and group B of homogeneity and unequal groups
The linear model $y = \alpha X + \beta$ ($X _{a}=0$, $X _{b}=1$) is equivalent to the T value of $\alpha$ significantly greater than 0.
For more general variances, paired-sample T-tests, and variance analysis of comparisons between groups (dummy coding technology), and listen to tell ^ ^
[Statistics in small eyes] difference test and general linear model (1)