Logistic regression analysis of R language

Source: Internet
Author: User
Tags square root

First, probit regression model
In R, you can use the GLM function (generalized linear model) to implement, simply set the option binomial option to probit, and use the summary function to get the details of the GLM results, but unlike LM, summary for the generalized linear model does not give a decision factor, The pseudo-determinant coefficients need to be obtained using the PR2 function in the PSCL package and then using summary to get the details
> Library (RSADBE)
> Data (Sat)
> Pass_probit <-glm (pass~sat,data=sat,binomial (probit))
> Summary (pass_probit)
> Library (PSCL)
> pR2 (pass_probit)
> Predict (Pass_probit,newdata=list (sat=400), type = "Response")
> Predict (Pass_probit,newdata=list (sat=700), type = "Response")

Second, logistic regression model

You can use the GLM function and its options family=binomial to fit the logistic regression model.

> Library (RSADBE)
> Data (Sat)
> pass_logistic <-glm (pass~sat,data=sat,family = ' binomial ')
> summary.glm (pass_logistic)
> pR2 (pass_logistic)
> with (pass_logistic, PCHISQ (Null.deviance-deviance, Df.null
+-df.residual, Lower.tail = FALSE))
> Confint (pass_logistic)
> Predict.glm (Pass_logistic,newdata=list (sat=400), type = "Response")
> Predict.glm (Pass_logistic,newdata=list (sat=700), type = "Response")
> sat_x <-seq (400,700, 10)
> pred_l <-Predict (Pass_logistic,newdata=list (sat=sat_x), type= "response")
> Plot (sat_x,pred_l,type= "L", ylab= "probability", xlab= "Sat_m")
The above code explains:
A logistic model is fitted through the GLM function and the model result details are obtained through SUMMARY.GLM, where the null deviance and residual deviance are similar to the sum of residuals in the linear regression model to evaluate the goodness of fit, Null Deviance is a residual of the model without any information, and if the independent variable has an effect on the dependent variable, then the residual deviance should be significantly smaller than the null deviance.
Using the PR2 function to get the pseudo-determinant coefficients, we get the significant level of the whole model through the WITH function, we get null.deviance, deviance, Df.null, SUMMARY.GLM function Df.residual, using the WITH function to extract the PCHISQ function and get the deviation to null.deviance-deviance, the degree of freedom is df.null-df.residual p-value.

The Confint function is used to obtain the confidence interval of the regression coefficients, and the values of the models are predicted by PREDICT.GLM when the arguments are 400 and 700.

Use the plot function to make a model diagram.


Third, the use of hosmer-lemeshow goodness-of-fit test
The steps to construct the statistic are
1. Sorting the Fit values using the classification and fitting functions
2. The sorted values are divided into G groups, the value of G is generally selected 6-10
3. Find the number of observations and expectations for each group
4. The card-side goodness of fit test is performed on these groups.

The implementation code is
> Pass_hat <-fitted (pass_logistic)
> Hosmerlem <-function (y, yhat, g=10) {
+ cutyhat <-Cut (Yhat,breaks = Quantile (Yhat, Probs=seq (0,1, 1/g)), include.lowest=true)
+ Obs = Xtabs (Cbind (1-y, y) ~ cutyhat)
+ expect = Xtabs (Cbind (1-yhat, Yhat) ~ cutyhat)
+ CHISQ = SUM ((obs-expect) ^2/expect)
+ P = 1-PCHISQ (CHISQ, g-2)
+ RETURN (list (chisq=chisq,p.value=p))
+                     }
> Hosmerlem (pass_logistic$y, Pass_hat)
First, the fitted function is used to extract the fitting value, and then the custom function is calculated


Residual plot of generalized linear model
The residuals of the generalized linear model are different from the residuals of the general linear model, but are similar in function
1. Response residuals
The difference between the true value and the fitted value
2. Abnormal residuals
For the first observation, the anomaly residual is the square root of the sum of anomalous observations in the model.
3. Pearson residuals
4. Local residuals
5. Woking residuals
The above residuals can be obtained using the residuals function

> Library (RSADBE)
> Data (Sat)
> pass_logistic <-glm (pass~sat,data=sat,family = ' binomial ')
> par (mfrow=c (1,3), Oma=c (0,0,3,0))
> Plot (Fitted (pass_logistic), residuals (pass_logistic, "response"), col= "Red", > xlab= "fitted Values", ylab= " Response residuals ")
> Points (Fitted (pass_probit), residuals (pass_probit, "response"), col= "green")
> Abline (h=0)
> Plot (Fitted (pass_logistic), residuals (pass_logistic, "deviance"), col= "Red", > xlab= "fitted Values", ylab= " Deviance residuals ")
> Points (Fitted (pass_probit), residuals (Pass_probit, "deviance"), col= "green")
> Abline (h=0)
> Plot (Fitted (pass_logistic), residuals (pass_logistic, "Pearson"), col= "Red", xlab= "fitted Values", ylab= "Pearson Residuals ")
> Points (Fitted (pass_probit), residuals (Pass_probit, "Pearson"), col= "green")
> Abline (h=0)
> title (main= "Response, deviance, and Pearson residuals Comparison for the Logistic and > Probit Models", outer=true)

The above code calculates the response residuals, abnormal residuals, and Pearson residuals, respectively, and graphs


The influence point and lever point of the generalized linear model
As with general linear models, generalized linear models also use Hatvalues, Cooks.distance, Dfbetas, dffits to calculate impact points and leverage points, but the judging criteria change
1.hatvalues value greater than 2 (p+1)/2, the observed value can be considered as a lever effect
2.Cooks distance is larger than 10% of the F-distribution, which can be considered to have an effect on the parameter estimation, which is considered to be a strong impact point if it exceeds the 50%-digit number.
The rule of thumb for 3.dfbetas, dffits, is that if the absolute value is greater than 1, the observations are considered to have an effect on the covariance

> hatvalues (pass_logistic)
> cooks.distance (pass_logistic)
> Dfbetas (pass_logistic)
> dffits (pass_logistic)
> Cbind (Hatvalues (pass_logistic), Cooks.distance (pass_logistic),
Dfbetas (pass_logistic), Dffits (pass_logistic))
> hatvalues (pass_logistic) >2* (Length (pass_logistic$coefficients)-1)
/length (pass_logistic$y)
> cooks.distance (pass_logistic) >qf (0.1,length (pass_logistic$coefficients),
Length (pass_logistic$y)-length (pass_logistic$coefficients))
> cooks.distance (pass_logistic) >qf (0.5,length (pass_logistic$coefficients),
Length (pass_logistic$y)-length (pass_logistic$coefficients))
> par (mfrow=c (1,3))
> Plot (Dfbetas (pass_logistic) [, 1],ylab= "dfbetas-intercept")
> Plot (Dfbetas (pass_logistic) [, 2],ylab= "Dfbetas-sat")
> Plot (dffits (pass_logistic), ylab= "Dffits")

Logistic regression analysis of R language

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.