R Language Practical reading notes (13) Generalized linear model

Source: Internet
Author: User

# Extramarital Affairs DataSet Data (affairs, package="AER") Summary (Affairs) Table (Affairs$affairs) # with two value variable, yes or no affairs$ynaffair[affairs$affairs>0] <-1Affairs$ynaffair[affairs$affairs==0] <-0Affairs$ynaffair<-factor (Affairs$ynaffair, levels = C (0,1), labels = c ("No","Yes") table (Affairs$ynaffair) #用glmfit. full<-GLM (Ynaffair ~ Gender + age + yearsmarried + children + religiousness + Education + occupation + rating, data = Affa IRS, family =binomial ()) Summary (fit.full)

Deviance residuals:
Min 1Q Median 3Q Max
-1.5713-0.7499-0.5690-0.2539 2.5191

Coefficients:
Estimate Std. Error z value Pr (>|z|)
(Intercept) 1.37726 0.88776 1.551 0.120807
Gendermale 0.28029 0.23909 1.172 0.241083
age-0.04426 0.01825-2.425 0.015301 *
Yearsmarried 0.09477 0.03221 2.942 0.003262 * *
Childrenyes 0.39767 0.29151 1.364 0.172508
religiousness-0.32472 0.08975-3.618 0.000297 * * *
Education 0.02105 0.05051 0.417 0.676851
Occupation 0.03092 0.07178 0.431 0.666630
rating-0.46845 0.09091-5.153 2.56e-07 * * *
---
Signif. codes:0 ' * * * ' 0.001 ' * * ' 0.01 ' * ' 0.05 '. ' 0.1 "1

(Dispersion parameter for binomial family taken to be 1)

Null deviance:675.38 on degrees of freedom
Residual deviance:609.51 on 592 degrees of freedom
aic:627.51

Number of Fisher scoring iterations:4

#结果是有些变量不显著把它们都去掉

fit.reduced <-GLM (ynaffair ~ age + yearsmarried ++ rating, data = affairs, family = binomial ()) Summary (fi t.reduced)

Call:
GLM (Formula = Ynaffair ~ Age + yearsmarried + religiousness +
Rating, family = binomial (), data = affairs)

Deviance residuals:
Min 1Q Median 3Q Max
-1.6278-0.7550-0.5701-0.2624 2.3998

Coefficients:
Estimate Std. Error z value Pr (>|z|)
(Intercept) 1.93083 0.61032 3.164 0.001558 * *
age-0.03527 0.01736-2.032 0.042127 *
Yearsmarried 0.10062 0.02921 3.445 0.000571 * * *
religiousness-0.32902 0.08945-3.678 0.000235 * * *
rating-0.46136 0.08884-5.193 2.06e-07 * * *
---
Signif. codes:0 ' * * * ' 0.001 ' * * ' 0.01 ' * ' 0.05 '. ' 0.1 "1

(Dispersion parameter for binomial family taken to be 1)

Null deviance:675.38 on degrees of freedom
Residual deviance:615.36 on 596 degrees of freedom
aic:625.36

Number of Fisher scoring iterations:4

#如上, each factor was found to be very significant. This is a two-model nesting, which can be compared using ANOVA, and for generalized linear regression, a chi-square test is available.

" CHISQ ")

Analysis of Deviance Table

Model 1:ynaffair ~ Age + yearsmarried + religiousness + Rating
Model 2:ynaffair ~ Gender + age + yearsmarried + children + religiousness +
Education + occupation + Rating
Resid. Df Resid. Dev Df deviance Pr (>chi)
1 596 615.36
2 592 609.51 4 5.8474 0.2108

The #卡方值不显著 (0.21) shows that the new model fits as well as the model that contains all the variables. This confirms that adding 4 removed variables does not significantly improve the predictive accuracy of the equation, and can be interpreted in terms of simpler models.

13.2.1 Interpreting model parameters

Look at the regression coefficients: when the other predictor variables are constant, the variation of a unit predictor variable can cause a change in the logarithmic advantage ratio of the response variable.

Coef (fit.reduced)

(Intercept) Age yearsmarried religiousness Rating
1.93083017-0.03527112 0.10062274-0.32902386-0.46136144

The results can be indexed because the logarithmic advantage is worse than the explanatory nature

Exp (COEF (fit.reduced))

(Intercept) Age yearsmarried religiousness Rating
6.8952321 0.9653437 1.1058594) 0.7196258 0.6304248

You can see the age of marriage increased by one year, the advantage of extramarital affairs will be multiplied by 1.106 (other unchanged), an increase of one year old, cheating advantage ratio multiplied by 0.965.

As a result, the advantage of extramarital affairs will rise with the increase of marriage age, the decrease of religion and marriage score.

Because the Predictor variable cannot be equal to 0, the Intercept entry has no specific meaning here.

You can also use the Confint function to get confidence intervals.

Exp (Confint (fit.reduced))

Waiting for profiling-be-done ...
2.5% 97.5
(Intercept) 2.1255764 23.3506030
Age 0.9323342 0.9981470
Yearsmarried 1.0448584 1.1718250
Religiousness 0.6026782 0.8562807
Rating 0.5286586 0.7493370

13.2.2 evaluation of the effect of predictive variables on the probability of results

TestData <-Data.frame (rating = C (12345),     = mean ( Affairs$age), yearsmarried = mean (affairs$yearsmarried),     =<-predict (fit.reduced, NewData = testdata,      " Response " ) TestData

Rating Age yearsmarried religiousness prob
1 1 32.48752 8.177696 3.116473 0.5302296
2 2 32.48752 8.177696 3.116473 0.4157377
3 3 32.48752 8.177696 3.116473 0.3096712
4 4 32.48752 8.177696 3.116473 0.2204547
5 5 32.48752 8.177696 3.116473 0.1513079

From the results, when the marriage score from 1 (very unhappy) into 5 (very happy), the probability of extramarital affairs reduced from 0.53 to 0.15 (assuming that age and other factors do not change), look at the age impact

 testdata <-data.frame (rating = Mean (affairs$rating), Age  = seq (17 , 57 , 10 ), yearsmarried = mean (affairs$yearsmarried), religiousness  = mean (affairs$religiousness)) Testdata$prob  <-Predict (fit.reduced, newdata = TestData, type  =  " response  "  ) testdata  

Rating Age yearsmarried religiousness prob
1 3.93178 17 8.177696 3.116473 0.3350834
2 3.93178 27 8.177696 3.116473 0.2615373
3 3.93178 37 8.177696 3.116473 0.1992953
4 3.93178 47 8.177696 3.116473 0.1488796
5 3.93178 57 8.177696 3.116473 0.1094738

As seen from the results, when the other variables are unchanged, the age increases from 17 to 57 o'clock, the extramarital affair probability will be reduced from 0.34 to 0.11, using this method, we can explore the effect of each predictor variable on the probability of the outcome.

13.2.3 over-release potential

Over-trend: The observed response variable variance is greater than the expected two-item distribution. This results in a singular standard false test and an imprecise significance test.

If there is excessive dissociation, you can also use GLM to fit the logistic regression, but you need to change the distribution of two items to class two.

Check method: Compare two distribution models and residual deviation and residual degrees of freedom, if the ratio is much larger than 1, it can be considered that there is excessive potential.

In this case, the ratio is =615.36/596=1.02, and there is no excessive dissociation.

It is also possible to test over-departure potential. To do this, it is necessary to fit the model two times, use family=binomial for the first time, and use family=quasibinomial for the second time, assuming that the first GLM return object is recorded as fit, The second return object is recorded as Fit.od, and with PCHISQ, the P-value provided can be tested on the 0 hypothesis (the ratio is 1) and the alternative hypothesis (the ratio is not equal to 1).

Fit <-GLM (ynaffair ~ age + yearsmarried + religiousness +     = binomial (), data =<-glm (ynaffair ~ Age + yearsmarried + religiousness +     = quasibinomial (), data =* fit$df.residual,     = F )

The result value of 0.34 is not significant, so there is no excessive potential.

13.3 Poisson regression

" Robust " ) names (Breslow.dat) Summary (breslow.dat[c (678)])
 opar <-par (no.readonly  = TRUE) par ( Mfrow  = C (1 , 2  )) Attach (Breslow.dat) hist (SumY, breaks  = 20 , Xlab =  " seizure Count  " , main =  " distribution of seizures  "  Span style= "color: #000000;" >) BoxPlot (SumY  ~ Trt, Xlab =  " treatment  , main = "  group comparisons   ) par (OPAR)  

The bias characteristics of the dependent variable and possible outliers are seen from the graph. The number of cases became smaller and the variance was smaller in the drug treatment. Unlike the standard least-squares regression, poisson regression does not focus on the quality of square differences.

Fit <-GLM (sumY ~ Base + age + Trt, data = Breslow.dat,     = poisson ()) Summary (FIT)

The results are very significant.

13.3.1 Interpreting model parameters

Coef (FIT) exp (Coef (FIT))

(Intercept) Base Age Trtprogabide
1.94882593 0.02265174 0.02274013-0.15270095

The age regression parameter is 0.0227, which indicates that other predictors remain unchanged, age increases by one year, and the logarithmic mean of epileptic seizures increases by 0.03.

In the exponential factor, the age increases by one year and the number of onset times is 1.023. TRT changes, number of onset *0.86. In other words, the number of seizures and the age of the drug group decreased by 20% compared with that of the placebo group.

13.3.2 over-release potential

The ratio of =559.44/55=10.17, greater than 1, indicates the presence of excessive dissociation potential.

The excess potential can also be tested in the following ways.

Library (QCC)
Qcc.overdispersion.test (breslow.dat$sumy, type = "Poisson")

The P-value is 0, and there is excessive dissociation.

# Fit Model with Quasipoisson
Fit.od <-GLM (sumY ~ Base + age + Trt, data = Breslow.dat,
Family = Quasipoisson ())
Summary (FIT.OD)

Reasons for excessive departures:

1. Missing an important predictor variable

2. May be related to the event. In a Poisson distribution, each event is considered to be an independent occurrence. Is that for any patient, each onset probability is considered to be independent of each other, but this is not true. A patient has been ill for 40 times, and the first probability is certainly not the same as the 40th chance.

3. In longitudinal data analysis, repeated measurements of data due to memory clustering characteristics can lead to excessive departures. Not discussed here.

R Language Practical reading notes (13) Generalized linear model

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.