R language-Generalized linear model

Source: Internet
Author: User

Usage scenarios: Result variables are categorical, binary variables, and multi-categorical variables that do not meet normal distributions

The result variables are counted, and their mean and variance are related

Workaround: Use the generalized linear model, which contains the analysis of the Feshengtai dependent variable

1.Logistics regression (dependent variable is category type)

Case: Matching a model of an extramarital affair

1. View statistics for datasets

2 Library (AER) 3 ' AER ' )4Summary (affairs)5 table (affairs$affairs)

Results: This data was collected from 601 participants, the number of extramarital affairs, sex, age, length of marriage, whether there were children, religious beliefs, educational backgrounds, occupations, and self-evaluation of the marriage of these 9 variables

The result variable is the number of extramarital affairs. 72% of couples have no extramarital affairs, most of them have extramarital affairs every month for 6%

2. Convert the result value to a two-value type factor

1 affairs$ynaffair[affairs$affairs > 0] <-12 affairs$ynaffair[affairs$affairs = 0] <- 0  3 affairs$ynaffair <-4                            levels=c (0,1),5                            labels=c ( " No ","Yes"))6 table (Affairs$ynaffair)

3. Use this factor as the result variable of a binary type variable

1 2                   Religiousness + Education + occupation +rating,3                 data=affairs,family=binomial ())  4 Summary (fit.full)

Results: Gender, whether there are children, education and occupation of the model is not significant, after removal analysis

1 2                      Rating, Data=affairs, family=binomial ())3 Summary (fit.reduced)

3. Use Chi-square test to judge the comparison

1 ' CHISQ ')

Result: p=0.21 that the new model fits better

4. Interpreting model parameters

1 Coef (fit.reduced) 2 exp (COEF (fit.reduced))

Results: Every 1 years of marriage age, the likelihood of extramarital affairs will be multiplied by 1.106, the opposite age increases 1 years old, the likelihood of extramarital affairs multiplied by 0.9652

5. Assessing the impact of marital scores on extramarital affairs

1 #1. Manually generate the data set2 #2. Use the Predict function to make predictions3TestData <-data.frame (Rating=c (1,2,3,4,5), age=mean (affairs$age),4Yearsmarried=mean (affairs$yearsmarried),5religiousness=mean (affairs$religiousness))6 testdata7Testdata$prob <-Predict (Fit.reduced,newdata = testdata,type='Response')8TestData

Result: When marriage score from 1 (very unfortunate) into 5 (very happy), the probability of extramarital affairs decreased from 0.53 to 0.15

6. Assessing the impact of age on extramarital affairs

1 testdata <-data.frame (rating=mean (affairs$rating),2                        age=seq (17,57,10),  3                        yearsmarried=mean (affairs$yearsmarried),4                        religiousness=mean ( affairs$religiousness))5 testdata$prob <-predict (fit.reduced,newdata = testdata,type='  response')6 testdata

Results: When other variables are unchanged, age from 17-57 years old, the probability of extramarital affairs decreased from 0.34 to 0.11

7. Determine if you are too far away from the trend

Excessive departures can lead to standard false tests and imprecise significance tests, and it is still possible to fit the logistics regression using GML (), but change the two distribution to the class two distribution

1 # If the result is close to 1, it means that there is no excessive 2 Deviance (fit.reduced)/df.residual (fit.reduced)

  

Result: No over-release potential

2. Poisson regression (dependent variable is count type)

Usage Scenario: Poisson distribution for predicting count-of-result variables through a series of continuous or categorical predictor variables

Case: Does drug therapy reduce the number of seizures

1. View Data Set

1 ' Robust ' )2names (breslow.dat)3 Summary (Breslow.dat[c (6,7,8,10)])

Results: We analyzed the relationship between age, treatment conditions, number of onset in the first eight weeks and number of onset in eight weeks after randomization, so only 4 variables were used.

2. Graphics

1Opar <-par (no.readonly =T)2Par (Mfrow=c ())3 Attach (Breslow.dat)4hist (Sumy,breaks = 20,xlab ='Seizure Count', main ='Distribution of Sizeture')5BoxPlot (sumy~trt,xlab='Treatment', main='Group Comparisons')6Par (OPAR)

Results: The incidence of epilepsy was reduced with the use of drugs in groups

3. Fitting Poisson regression

1 fit <-glm (sumy~base+age+trt,data = breslow.dat,family = Poisson ())2 summary (FIT)

Results: deviation, regression parameters, standard error and test with parameter 0

4. Interpreting model parameters

1 Coef (FIT) 2 exp (Coef (FIT))

Results: Every 1 years of age, the incidence of epilepsy will be multiplied by 1.023, if transferred from the placebo group to the drug group, the morbidity will be reduced by 14%

5. Determine if you are too far away from the trend

1 deviance (FIT)/df.residual (FIT)

Results: More than 1, there is excessive dissociation potential

6. Adjusting the model

1 fit.new <-glm (sumy~base+age+trt,data = breslow.dat,family = Quasipoisson ())2 Summary ( Fit.new)

Results: The standard error is much larger than the first model, and the more the standard error results in a P-value of TRT greater than 0.05, there is insufficient evidence that drug therapy can reduce the number of seizures compared to the use of placebo.

R language-Generalized linear model

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.