Description of classification variable and continuous variable distribution
Bootstrap sampling, rank and test explore variable associations: Comparing mean, independent samples
Variable transformation: P-Map
Rank Transformation Analysis: Ranking of cases
Cox regression model
The essence of variance analysis is the general linear model of self-variable containing categorical variables.
The analysis of variance is a parameter test with certain assumptions. When the assumptions for variance analysis are not met, select
(1) Non-parametric examination (bootstrap sampling, rank and test) method- In fact, a method of initial judgment;
(2) or considering variable transformation , the variance analysis of the transformed variable;-widely used
(3) or variable transformation can not be solved, consider the Rank transformation analysis method . That is, the variance analysis of the rank of the original variable;-Wider application
(4) Rank transformation Analysis method will lose some data information, consider the Cox regression model analysis in survival analysis.
For the case of this chapter, the results of the analysis of three methods are consistent, the difference of hormone levels in different groups is significant, sex and age have no effect on hormone level.
1. Case background
Under the control of other factors, the study of hormone levels in the control group and the experimental group (with stomach cancer) was different among two groups.
2. Data understanding
Single Variable Description:
To see the frequency distribution of categorical variables, the common method is to describe processes, frequency processes, or watchmaking processes. The following is a tabulation process to simplify the frequency distribution of the output categorical variables.
Look at the distribution of continuous variables in the description process, by drawing a histogram to see whether continuous variables are normally distributed.
Conclusion: The hormone level is a significant right-biased distribution. The following to consider whether the distribution of hormone levels to meet the model of the data distribution requirements, if not satisfied, then how to deal with.
Variable Correlation exploration:
Objective: Nonparametric test is to identify the difference between different groups without assuming that the parameters obey the normal distribution.
The above analysis of the distribution of single variables, if you want to see such as: hormone levels in different groups, gender, age distribution, need to carry out variable correlation exploration, compare the different factors of the average size of the hormone water.
Because the hormone level is a partial distribution, the mean value comparison between groups is performed by bootstrap sampling, rank and test.
Here is a sample of bootstrap:
Bootstrap sampling has parametric method and non-parametric method. The parametric method assumes the distribution of the parameters first, and the Nonparametric method is not assumed to be distributed.
When the frequency distribution is approximate normal, the mean value is used to estimate the bootstrap confidence interval, and when the frequency distribution is the partial distribution, the median number is used to estimate the 95% confidence interval with the upper and lower 2.5%-bit.
Here the median is used to denote the average level of each group, and the confidence interval of the median is calculated to approximate the statistical difference of the hormone levels under different factors.
"Analysis-compare mean-mean"
Preliminary conclusion: There is a difference in the median hormone level between two groups.
Here is the rank and test:
Gives a more accurate result of whether there are differences between groups.
SPSS provides the new and old two sets of rank and inspection of the operating interface, respectively, as follows:
Study the relationship between age and hormone levels using scatter plots:
For exploring the relationship between two consecutive variables, plotting a scatter plot is the first choice.
3. Modeling analysis after transformation of dependent variables
Common Variable Transformation methods:
Logarithmic conversion, square root conversion, reciprocal conversion, and so on.
The specific operation of this case:
The logarithmic transformation of the hormone level of the dependent variable is performed to see if the variance analysis is satisfied.
Variable transformation, that is, by the mathematical transformation of the original data, so that it becomes satisfied or approximate to meet the requirements of variance analysis, the transformed variable analysis of variance.
Verify that the hormone levels after the logarithmic transformation are satisfied with the normal distribution: P-Map
The results show that the level of hormone after logarithmic transformation is close to normal, and the variance analysis model can be established.
The lack of goodness of fit test is a test of whether the predicted effect of the current model is different compared to the saturated model (that is, the model that includes all the main and interactive effects).
4. Rank Transformation Analysis
When variable transformations do not solve the problem, consider using the Rank transform analysis method in nonparametric statistical analysis.
The so-called Rank transformation analysis method is to first find out the rank of the original variable, and then use the obtained rank to replace the original variable for parameter analysis. That is, the variance analysis of the rank is obtained.
5. Using Cox model for analysis
The rank Transform analysis method is used to circumvent the problem of non-normal distribution of data by using rank, but some information is lost. The solution is to use Cox regression model analysis in survival analysis.
Event: The end of the life time specified by the researcher.
The basic idea of Cox regression model is to establish an association similar to the generalized linear model between the risk function and the research factor.
Hormone levels: understood as survival time;
Since each sample has a definite hormone level measurement, that is, survival time, each person's survival outcome is a failure event. Failure event Value =1.
Analysis of the factors affecting the hormone levels of ch7--in-depth learning variance analysis Model