Analysis of influential factors of delayed craniocerebral injury after first aid of ch9-brain trauma case-logistic Regression

Source: Internet
Author: User
Tags natural logarithm

Chi-Square test-investigate the correlation of categorical variables-"cross-table" or "set-table";

T-Test-to investigate the correlation between continuous variables and categorical variables-"Set table";

Linear logsitic Regression-study the relationship between categorical dependent variables and a set of independent variables (can be continuously classified);

Tree structure Model-study the interaction between independent variables

Generalized linear models-models are established in a broader context.

1. Case background

To collect samples of first-aid cases of traumatic brain injury and analyze the factors that lead to brain injury after first aid. Dependent variables: whether there are late-onset brain injury, two categorical variables; Independent variables: continuous variables, categorical variables.

Chi-Square Test: Study the relationship between categorical variables;

Since the dependent variable is a two categorical variable, it is not possible to use normal linear regression or variance analysis, so the logistic regression model is established.

Considering the interaction between the independent variables, the classification tree model is adopted.

2. Data understanding

Diagram description of the variable association:

Distribution of continuous variables

Analysis-Description Statistics-description, and then draw stacked histogram and block box diagram for data presentation. As follows:

Table description of the relationship between categorical variables

Analysis-table-Set table

Variables associated with univariate tests:

Study on the function of categorical independent variables

The relationship between classification dependent variables and categorical variables is studied, and chi-square test is used to assume that the dependent variables are independent from each other. There are two ways to use chi-square test: One is "cross-table", see the relationship of all two categorical variables respectively; one is the "set table" in the process of watchmaking, and the relationship between the categorical dependent variable and the individual categorical arguments is displayed in a table. The first method is a bit cumbersome, and the second method is common.

Note: From the experience, the general univariate analysis of variables with p value less than 0.2 can be considered in the subsequent multivariate modeling to continue to investigate, p-value is higher than 0.2 unless there is a clear meaning in the professional, otherwise do not investigate.

Method One:

Method Two:

Study on the function of continuous independent variables

The relationship between classification dependent variables and continuous independent variables is studied, and logistic regression model is established. However, pre-analysis before modeling to investigate whether the relationship between the dependent variable and the independent variable is statistically significant, one method is to exchange the continuous dependent variable with the categorical independent variable for T test. Also in the indicator process completed, the specific operation is as follows:

3. Build two classified logistic regression model

When the dependent variable is a categorical variable, the occurrence probability of a category of the dependent variable is actually obtained by using the multivariate linear regression model. Since the probability value range on the left side of the model equation is (0,1), the right side is (negative infinity, positive infinity), and the equation does not match around. And the relationship between the probability and the independent variable is often not linear. This requires a logit transformation of these two issues.

Logit (P) =ln (p/(1-p)). A linear regression model is established for logit (P), which is the logistic regression model.

Logistic regression model is a standard modeling method to study classification dependent variables.

Applicable conditions of logistic regression model

The occurrence rate of the categorical variable or an event is classified as two because the variable is subject to two distributions;

The relationship between the independent variable and logit (P) is linear;

The total residuals were 0 and were subject to two distributions;

Each observation object is independent of each other.

Because the residual of the logistic model is two distributions instead of normal distribution, the maximum likelihood estimation is used to solve the problem of estimating and testing the equation.

The logistic model is the probability prediction model of occurrence.

Initial attempt to model

For the specific interpretation of the results between P172. Some difficulty, good understanding.

Building the final model

Finally, 3 variables (diastolic pressure, hormones, ln platelets) were introduced to establish the final logistic regression model through the significance test of the variables in the initial model.

4. Using tree model to discover interaction items

There are two problems in the final model of the last section, the relationship between LOGITP and the independent variable must be linear? Is there a possibility of a curve? Are there interactions between the respective variables?

Solution: The tree structure model provides an effective tool to solve the problem of self-variable interaction and curve correlation, and to complement the shortcomings of classical modeling methods.

The basic idea of the tree model: The total study population through certain characteristics (self-variable value) into a number of relatively homogeneous sub-population, the value of the group internal variables are highly consistent, the corresponding variations/impurities as far as possible in different sub-population.

Depending on the type of the dependent variable (categorical or sequential), the tree structure model is divided into a classification tree and a regression tree.

Perform Tree model analysis

In this case, the CRT algorithm is selected, and the analysis of the importance of the candidate arguments is also output. Here's how:

The results show that the diastolic blood pressure interacts with the natural logarithm of platelets, and the importance sequencing of the candidate variables is analyzed.

5, the use of generalized linear process analysis

Interaction between diastolic blood pressure and natural logarithm of platelets = manually establish a new variable for the product of the above two variables, and add the model for analysis.

The logistic regression model belongs to the generalized linear model, so the generalized linear model process is used to accomplish it.

Generalized linear models extend the general linear model, with differences in:

The distribution of the dependent variable of generalized linear model extends from normal distribution to two-item distribution, Poisson distribution and other exponential distribution clusters, and the value of the dependent variable is connected with the linear part of the independent variable through the connection function.

Build a model that contains only the main effects (that is, the interaction items are not included):

To add an interaction entry in the model:

The product of diastolic blood pressure and the natural logarithm of platelets is obtained by "computational variables", and the interaction information is added by the above steps.

Analysis of influential factors of delayed craniocerebral injury after first aid of ch9-brain trauma case-logistic Regression

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.