[Reading notes] R language Combat (13) Generalized linear model

Source: Internet
Author: User

Generalized linear models extend the framework of a linear model, which contains the analysis of non-normal dependent variables

Generalized linear model fitting form:

$ $g (\MU_\LAMBDA) = \beta_0 + \sum_{j=1}^m\beta_jx_j$$

$g (\MU_\LAMBDA) is the connection function $. Assuming that the response variable obeys a distribution in the exponential distribution family (not just the normal distribution), the standard linear model is greatly expanded, and the model parameter estimation is based on the maximum likelihood estimation rather than the least squares method.

You can relax the assumption that Y is a normal distribution, and instead y obey a distribution in the exponential distribution family.

GLM () function: GLM (formula,family=family (link=function), data =)

Logistic regression: The response variable is two values (0,1), the model assumes that Y obeys two distributions, and the linear model fits the form:

The following code can be used to fit logistic regression: GLM (y~x1+x2+x3, family = binomial (link= ' logit ', data=mydata)

#通过婚外情数据来预测婚外情情况 # Each participant has 9 variables: gender, age, marriage, whether there are children, degree of religious belief, #学历, occupation, marital self-rating library (AER) data (affairs,package = ' AER ') # View descriptive Statistics summary (affairs) #将affairs转化Wie二值因子ynaaffairAffairs $ynaffair [affairs$affairs > 0] <-1affairs$ Ynaffair[affairs$affairs = = 0] <-0affairs$ynaaffair <-factor (Affairs$ynaffair,levels=c (0,1), Labels=c ("NO", " Yes ") Table (Affairs$ynaffair) #因子化之后的值可以作为Logistic回归的结果变量fit. Full <-GLM (ynaffair~gender+age+yearsmarried+ Children+religiousness+education+occupation+rating,data=affairs,family=binomial ()) #描述模型summary (fit.full) # By P worth of sex, children, education, profession contribution to the equation is not significant, remove these variables to re-fit fit.reduced <-GLM (ynaffair~age+yearsmarried+religiousness+rating,data= Affairs,family=binomial ()) Summary (fit.reduced) #由结果可以看出这次个每个回归系数都很显著 # because two models are nested, they can be compared using ANOVA () # Chi-Square value p= 0.21, the new model of four predictors is as good as the simulated fit of nine predictor variables Anova (fit.reduced,fit.full,test= ' chisq ') #解释模型系数: Logarithmic advantage Coef (fit.reduced) # Index Advantage exp (COEF (fit.reduced)) #评价预测变量对结果概率的影响 # Effect of marital score on extramarital affairs # Create a virtual dataset, age, marriage, religious beliefs are mean, marital score is 1-5testdata <-data.frame (Rating=c (1,2,3,4,5), Age=mean (AfFairs$age), Yearsmarried=mean (affairs$yearsmarried), Religiousness=mean (affairs$religiousness)) Testdata$prob = Predict (Fit.reduced,newdata = testdata,type= "Response") testdata

Logistic regression variants:

    • The Glmrob () function in the robust logistic regression:robust package can fit the generalized linear model of a file, and when the regression model is fitted with a strong influence point of outliers, the robust logistic regression can be derived in a useful manner.
    • Multi-item regression, if the response variable contains more than two unordered analogies (married, widowed, divorced), you can use the Mlogit () function in the Mlogit package to fit multiple logistic regression
    • Ordinal logistic regression, if the response variable is a set of ordered categories (good, medium, poor), you can use the Mlogit () function in the RMS () package to fit multiple logistic regression

Poisson regression: The response variable is a count type, the model assumes that Y follows the Poisson distribution, and the linear model fits the form:

Many of the functions of the analytical standard Linear model LM () have a corresponding form in GLM ():

#使用robust包中的癫痫数据Breslow to discuss the impact of epilepsy data on the incidence of Epilepsy Library (robust) data (Breslow.dat,package = "robust") names (Breslow.dat) # We are only concerned with the TRT treatment conditions, Age: ages, base epilepsy number base, response variable eight weeks after the onset of Sumy randomization summary (breslow.dat[c (6,7,8,10))) Opar <-par ( no.readonly = TRUE) par (mfrow=c) Attach (Breslow.dat) #图中可以看到因变量的偏倚特性和可能的离群点hist (sumy,breaks = 20,xlab = "Seizure Count ", main=" distribution of Seizures ") BoxPlot (sumy~trt,xlab=" treatment ", main=" Group comarisons ") par (OPAR) # Fit Poisson regression fit <-GLM (Sumy~base+age+trt,data=breslow.dat,family=poisson ()) Summary (FIT) #获取模型系数coef (FIT) exp (Coef (Fit ))

  

Variants of Poisson's regression:

    • Time period Change Poisson regression
    • 0 Expansion poisson regression: logisitic regression + poisson regression

Model Fitting and regression diagnostics:

1. The graph of the initial response variable's predicted value and residual error

2. Test whether the model is too far away from the potential

[Reading notes] R language Combat (13) Generalized linear model

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.