Hotelling T2 test and multivariate variance analysis

Source: Internet
Author: User

1.1 Hotelling T2 Test
Hotelling T2 test is a common multivariate testing method, which is a natural generalization of single-variable test, and is often used for the comparison of two groups of mean vectors.
A sample of two content analysis is n,m from q-dimensional normal distribution N (μ1,∑), N (μ2,∑) with a common covariance matrix, to examine
H0:μ1=μ2 h1:μ1≠μ2
The average vector x, Y and the combined intra-group covariance matrix s are calculated for the mean value of each variable of two samples, then the statistic T2 is
  
Among them, s= (lx+ly)/(N+m-2), is a combination of covariance matrices, respectively, two samples of the dispersion matrix, namely:
  
After obtaining T2, we can check the corresponding boundary value table to get P value, thus make a conclusion. However, it is usually converted to statistic F and the P value is obtained by the F distribution.
  
1.2 Multi-element variance analysis
Multivariate Anova (multivariate analysis of variance, Manova) is a generalization of univariate variance analysis and hotelling T2 test, which is used for the comparison between multiple groups of vectors.
The G samples with a content of n1,n2,..., ng from q-dimensional multivariate normal distribution, Nq (μ1,∑), Nq (μ2,∑),..., Nq (μg,∑) can be used to determine whether multiple mean vectors come from the same population based on the orthogonal principle or the maximum likelihood ratio principle. Commonly used statistics are wilksλ, Pillai traces, Hotelling-lawley traces and the largest characteristic roots of Roy. They are all related to intra-group and inter-group dispersion matrices, and their inferences are generally consistent, so here are only the most commonly used Wilks statistics:
  
In the Formula E is the merged group dispersion Matrix, B is the inter-group dispersion matrix, and T is the total deviation matrix. λ satisfies the Wilks distribution of degrees of freedom Q, N (=n1+n2+...+ng) and g, and can find its corresponding boundary value table to get p value, but it is also usually converted to F-distribution to get P-value.

Http://blog.sina.com.cn/s/blog_647fe1580101i84d.html

Http://www.qnr.cn/med/data/lcyxzk/ylx/201003/376058.html

This paper introduces the application of Hotelling T2 test and multivariate variance analysis in the analysis of life quality data of drug addicts, and discusses the related problems. It is considered that hotelling T2 test and multivariate variance analysis are suitable for life quality data, which can not only draw the conclusion of the whole life quality comparison, but also obtain the comparison results in various fields with the help of further univariate analysis.
  Key Words Hotelling T2 test Multivariate variance analysis life quality drug abuse

Quality of life, QOL includes multiple domains (domain), each of which is divided into small areas (facets) and entries (item), so the quality of life data is a multi-indicator and multiple-end data. With the increasing prosperity of life quality research, the analysis methods of life quality data are getting more and more attention. In view of the problems existing in univariate testing [2,3], a natural idea is to treat the quality of life as a variable, using hotelling T2 test and multivariate variance analysis. Taking the treatment of life quality data of drug addicts as an example, this paper introduces the application of Hotelling T2 test and multivariate variance analysis in the analysis of life quality data, and discusses the related problems in application.

 

1 methods
1.1 Hotelling T2 test

Hotelling T2 test is a common multivariate testing method, which is a natural generalization of single-variable test, and is often used for the comparison of two groups of mean vectors.
A sample of two content analysis for N,m comes from q-dimensional normal distribution N (μ1,∑), N (μ2,∑) with a common covariance matrix, to verify that
h0:μ1=μ2 h1:μ1≠μ2
Calculates the mean vector x of each variable of two samples respectively, Y and the combined group of covariance matrix S, then the statistic T2 is

wherein, s= (lx+ly)/(N+m-2), is a combination of covariance matrices, respectively, two samples of the dispersion array, namely:

After the T2, you can check the corresponding boundary value table to get P value, thus making a conclusion. However, it is usually converted to statistic F and the P value is obtained by the F distribution.

1.2 multivariate variance analysis
Multivariate ANOVA (multivariate analysis of variance, Manova) is a generalization of univariate Anova and hotelling T2 tests, Used for comparisons between groups of mean vectors.
The G samples with a content of n1,n2,..., ng are derived from q-dimensional multivariate normal distribution, Nq (μ1,∑), Nq (μ2,∑),..., Nq (μg,∑), and multiple statistics can be used to determine whether multiple mean vectors come from the same population based on the orthogonal principle or the maximum likelihood ratio principle. Commonly used statistics are wilksλ, Pillai traces, Hotelling-lawley traces and the largest characteristic roots of Roy. They are all related to the intra-Group and inter-group dispersion matrix, and their inference is generally consistent, so this is only the most commonly used Wilks statistics:

In the combination of the intra-group dispersion matrix, B is the inter-group deviation matrix, T is the total deviation matrix. λ satisfies the Wilks distribution of degrees of freedom Q, N (=n1+n2+...+ng) and g, and can find its corresponding boundary value table to get p value, but it is also usually converted to F-distribution to get P-value.

2 Example Analysis
The quality of life test for drug addicts Qol-da[4] consists of 4 areas: somatic function (PH), mental function (PS), withdrawal symptoms and side effects (ST) and social function (SO). In this scale, 158 patients with compulsory detoxification were randomly selected and 54 cases of voluntary detoxification were determined. The scores of the above 4 points were respectively used as 4 analysis variables, the quality of life of the two groups was compared (due to the normal distribution and the variance is homogeneous, so the direct use of T-Test processing), the results are shown in the schedule.

Comparison of the quality of life between the schedule and the original drug addicts ' entry

Analysis indicators Force Group Voluntary group T P
Mean number Standard deviation Mean number Standard deviation
Body function 24.48 7.50 23.50 7.39 0.81 0.42
Psychological function 26.91 8.52 27.98 8.45 -0.80 0.43
Withdrawal symptoms/Side effects 30.64 11.37 31.29 12.18 -0.36 0.72
Social function 32.08 9.68 35.83 10.13 -2.04 0.042


It is indicated from the schedule that there is only significant difference in social functional field between the two detoxification groups.
It is clear that the single-variable t-test can only carry out individual analysis of the quality of life in each area, and lacks an overall assessment of the quality of life. Therefore, this method is used for processing.
Because each variable obeys normal distribution, it can be seen to satisfy the multivariate normal distribution, and the comparison of two groups, so the hotelling T2 test is used to obtain:
Force group average vector (PH,PS,ST,SO) to x= (24.48 26.91 30.64 32.68)
Voluntary group average vector (PH,PS,ST,SO) is y= (23.50 27.98 31.29 35.83)
Covariance uniformity test: f=9.34 p=0.499
Hotelling T2 test: f=2.48 p=0.045
The P-value of the covariance array test is large, which can be considered to satisfy the uniformity, so the results of hotelling T2 test can be adopted. From the four areas of life quality, it can be concluded that there are statistically significant differences in the quality of life in the two detoxification groups.

3 Discussions
3.1This method theoretically requires two or more groups from the multivariate normal distribution, and the covariance matrix is equal (homogeneous). There is still no good way to solve the problem of multivariate normality test, it is common practice that each variable is normally distributed, it is considered as a multivariate normal distribution. In addition, the test has a certain robustness to different distributions, that is to say, no matter how the distribution, when the sample content is large, its test results are generally unchanged. Therefore, in the practical problem, it is often judged by the professional knowledge whether multivariate normal, and often can be seen as a multi-modal processing. Especially in the quality of life assessment, the index scores of all levels are often the middle, the two ends gradually reduce, can be considered as normal distribution. In contrast, the covariance matrix has a large influence on the results, and the difference of the covariance array is often encountered. Therefore, it is necessary to test the covariance matrix before comparison. The test of the uniformity of multi-group covariance can be found in the literature of Fang Kai-tai [5].
3.2After testing the variance matrix, the multi-group comparison can adopt some non-parametric method [6], if the difference between the two groups is not ∑1,∑2 can use Carter and other [7] proposed method to solve, the difference is large when using yao[8] proposed approximate hotelling T2 test method, namely:
  
Both SX and SY are two-sample covariance matrices.
Make
  
Then there are:
3.3Similar to the relationship between T-Test and variance analysis in univariate analysis, the Hotelling T2 test is only used for comparison between two groups, and multivariate Anova can be used for comparisons between groups or two groups, which is equivalent to the Hotelling T2 test in two groups.
3.4When the difference is statistically significant, it can only be considered that the variables are different in synthesis, and there is a constant belief that there are differences between each variable. At this time, we often make a further comparison between each single variable, so as to obtain the results of the comprehensive comparison and can see the change of the variables and role. Generally speaking, Hotelling T2 test and multivariate variance analysis are more sensitive, as long as there is a significant difference in a variable, it often leads to the whole mean vector is also different. In some multi-index comprehensive evaluation may be considered that this result is not reasonable, at this time can be swapped with other methods, such as O ' Brien non-parametric method [6].
3.5Some scholars [6,9] think that hotelling T2 test or multivariate Anova can only explain the difference between two groups or groups, but can not distinguish between the favorable or unfavorable changes, especially when the changes between the variables are inconsistent, it is difficult to explain the merits and demerits of each group, so it is not suitable for clinical trial data and life quality data analysis. But I think that can not deny its value, as long as the difference has a statistically significant conclusion, it can be judged for the different groups, as to how different can be further judged by the single-variable test results, and this is also very practical significance. In terms of quality of life assessment, some groups may be good body function, some may be good psychological function, as long as the difference can be distinguished, not necessarily to explain which group for the gifted. The aforementioned "over-sensitivity" may have a certain effect on its application in life quality analysis.
3.6Another key issue in applying this approach is the difficulty of computing, which can be achieved through a number of packages. For example, in SAS and SPSS software package, ANOVA (or Manova), GLM and so on are all output the statistic value of four multivariate ANOVA analysis, so it is convenient to solve the calculation problem. Although there is no specific hotelling T2 test statistics, the Hoteling-lawley traces in the two groups are compared with T2 equivalent, the difference is only one constant (n+m-2).
3.7Similar to univariate variance analysis, if there are other metering variables that affect the quality of life, it can be used as a covariance and treated with multivariate covariance analysis.

Author Unit: Health Statistics Department of Zhongshan Medical University Guangzhou 510089 China
* Kunming Detoxification Institute

Reference documents

[1]cox DR, Fitzpatirick R, Fletcher AE et al. quality-of-life Assessment:can we keep it simple? J.r.statist. Soc.a. 1992,155:353.
[2]olschewski m, Schumacher M. Statistical analysis of quality of life data in cancer clinical trials. Statistics in Medicine 1990,9:749.
[3] Wan Chonghua, Fang Ji. Statistical analysis method of life quality data. Chinese Preventive Medicine 1996,30 (3): 172.
[4] Wan Chonghua, Fang Ji, Chenli, et. Establishment and evaluation of the quality of life determination scale for heroin addicts. Journal of Chinese Behavioral Medicine Science, 1997,6 (3): 169.
[5] Fang Kai-tai. Practical multivariate statistical analysis. Shanghai: East China Normal University Press, 1989,132~136.
[6]o? Brien P. C. Procedures for comparing samples with multiple endpoints. Biometrics 1985,40:1079.
[7]carter EM, Khatri CG, Srivastava MS. The effect of inequality of variances on t-test. Sankhya, Ser B41 1979,216~225.
[8]yao Y. An Approximatte degrees of freedom solution to the multivariate behrens-fisher problem. Biometrika 1965,52:139~147.
[9] Tom Danlin, Wang Cypress. Determination of quality of life and its application in clinical trials. Chinese Journal of Medicine, 1994,74 (3): 175.

Received: 1998-08-10

Hotelling T2 test and multivariate variance analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.