Usage scenarios: Result variables are categorical, binary variables, and multi-categorical variables that do not meet normal distributions
The result variables are counted, and their mean and variance are related
Workaround: Use the generalized linear model, which contains the analysis of the Feshengtai dependent variable
1.Logistics regression (dependent variable is category type)
Case: Matching a model of an extramarital affair
1. View statistics for datasets
2 Library (AER) 3 ' AER ' )4Summary (affairs)5 table (affairs$affairs)
Results: This data was collected from 601 participants, the number of extramarital affairs, sex, age, length of marriage, whether there were children, religious beliefs, educational backgrounds, occupations, and self-evaluation of the marriage of these 9 variables
The result variable is the number of extramarital affairs. 72% of couples have no extramarital affairs, most of them have extramarital affairs every month for 6%
2. Convert the result value to a two-value type factor
1 affairs$ynaffair[affairs$affairs > 0] <-12 affairs$ynaffair[affairs$affairs = 0] <- 0 3 affairs$ynaffair <-4 levels=c (0,1),5 labels=c ( " No ","Yes"))6 table (Affairs$ynaffair)
3. Use this factor as the result variable of a binary type variable
1 2 Religiousness + Education + occupation +rating,3 data=affairs,family=binomial ()) 4 Summary (fit.full)
Results: Gender, whether there are children, education and occupation of the model is not significant, after removal analysis
1 2 Rating, Data=affairs, family=binomial ())3 Summary (fit.reduced)
3. Use Chi-square test to judge the comparison
1 ' CHISQ ')
Result: p=0.21 that the new model fits better
4. Interpreting model parameters
1 Coef (fit.reduced) 2 exp (COEF (fit.reduced))
Results: Every 1 years of marriage age, the likelihood of extramarital affairs will be multiplied by 1.106, the opposite age increases 1 years old, the likelihood of extramarital affairs multiplied by 0.9652
5. Assessing the impact of marital scores on extramarital affairs
1 #1. Manually generate the data set2 #2. Use the Predict function to make predictions3TestData <-data.frame (Rating=c (1,2,3,4,5), age=mean (affairs$age),4Yearsmarried=mean (affairs$yearsmarried),5religiousness=mean (affairs$religiousness))6 testdata7Testdata$prob <-Predict (Fit.reduced,newdata = testdata,type='Response')8TestData
Result: When marriage score from 1 (very unfortunate) into 5 (very happy), the probability of extramarital affairs decreased from 0.53 to 0.15
6. Assessing the impact of age on extramarital affairs
1 testdata <-data.frame (rating=mean (affairs$rating),2 age=seq (17,57,10), 3 yearsmarried=mean (affairs$yearsmarried),4 religiousness=mean ( affairs$religiousness))5 testdata$prob <-predict (fit.reduced,newdata = testdata,type=' response')6 testdata
Results: When other variables are unchanged, age from 17-57 years old, the probability of extramarital affairs decreased from 0.34 to 0.11
7. Determine if you are too far away from the trend
Excessive departures can lead to standard false tests and imprecise significance tests, and it is still possible to fit the logistics regression using GML (), but change the two distribution to the class two distribution
1 # If the result is close to 1, it means that there is no excessive 2 Deviance (fit.reduced)/df.residual (fit.reduced)
Result: No over-release potential
2. Poisson regression (dependent variable is count type)
Usage Scenario: Poisson distribution for predicting count-of-result variables through a series of continuous or categorical predictor variables
Case: Does drug therapy reduce the number of seizures
1. View Data Set
1 ' Robust ' )2names (breslow.dat)3 Summary (Breslow.dat[c (6,7,8,10)])
Results: We analyzed the relationship between age, treatment conditions, number of onset in the first eight weeks and number of onset in eight weeks after randomization, so only 4 variables were used.
2. Graphics
1Opar <-par (no.readonly =T)2Par (Mfrow=c ())3 Attach (Breslow.dat)4hist (Sumy,breaks = 20,xlab ='Seizure Count', main ='Distribution of Sizeture')5BoxPlot (sumy~trt,xlab='Treatment', main='Group Comparisons')6Par (OPAR)
Results: The incidence of epilepsy was reduced with the use of drugs in groups
3. Fitting Poisson regression
1 fit <-glm (sumy~base+age+trt,data = breslow.dat,family = Poisson ())2 summary (FIT)
Results: deviation, regression parameters, standard error and test with parameter 0
4. Interpreting model parameters
1 Coef (FIT) 2 exp (Coef (FIT))
Results: Every 1 years of age, the incidence of epilepsy will be multiplied by 1.023, if transferred from the placebo group to the drug group, the morbidity will be reduced by 14%
5. Determine if you are too far away from the trend
1 deviance (FIT)/df.residual (FIT)
Results: More than 1, there is excessive dissociation potential
6. Adjusting the model
1 fit.new <-glm (sumy~base+age+trt,data = breslow.dat,family = Quasipoisson ())2 Summary ( Fit.new)
Results: The standard error is much larger than the first model, and the more the standard error results in a P-value of TRT greater than 0.05, there is insufficient evidence that drug therapy can reduce the number of seizures compared to the use of placebo.
R language-Generalized linear model