Logistic regression analysis of R language

Last Update:2017-03-09 Source: Internet

Author: User

Tags square root

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, probit regression model
In R, you can use the GLM function (generalized linear model) to implement, simply set the option binomial option to probit, and use the summary function to get the details of the GLM results, but unlike LM, summary for the generalized linear model does not give a decision factor, The pseudo-determinant coefficients need to be obtained using the PR2 function in the PSCL package and then using summary to get the details
> Library (RSADBE)
> Data (Sat)
> Pass_probit <-glm (pass~sat,data=sat,binomial (probit))
> Summary (pass_probit)
> Library (PSCL)
> pR2 (pass_probit)
> Predict (Pass_probit,newdata=list (sat=400), type = "Response")
> Predict (Pass_probit,newdata=list (sat=700), type = "Response")

Second, logistic regression model

You can use the GLM function and its options family=binomial to fit the logistic regression model.

> Library (RSADBE)
> Data (Sat)
> pass_logistic <-glm (pass~sat,data=sat,family = ' binomial ')
> summary.glm (pass_logistic)
> pR2 (pass_logistic)
> with (pass_logistic, PCHISQ (Null.deviance-deviance, Df.null
+-df.residual, Lower.tail = FALSE))
> Confint (pass_logistic)
> Predict.glm (Pass_logistic,newdata=list (sat=400), type = "Response")
> Predict.glm (Pass_logistic,newdata=list (sat=700), type = "Response")
> sat_x <-seq (400,700, 10)
> pred_l <-Predict (Pass_logistic,newdata=list (sat=sat_x), type= "response")
> Plot (sat_x,pred_l,type= "L", ylab= "probability", xlab= "Sat_m")
The above code explains:
A logistic model is fitted through the GLM function and the model result details are obtained through SUMMARY.GLM, where the null deviance and residual deviance are similar to the sum of residuals in the linear regression model to evaluate the goodness of fit, Null Deviance is a residual of the model without any information, and if the independent variable has an effect on the dependent variable, then the residual deviance should be significantly smaller than the null deviance.
Using the PR2 function to get the pseudo-determinant coefficients, we get the significant level of the whole model through the WITH function, we get null.deviance, deviance, Df.null, SUMMARY.GLM function Df.residual, using the WITH function to extract the PCHISQ function and get the deviation to null.deviance-deviance, the degree of freedom is df.null-df.residual p-value.

The Confint function is used to obtain the confidence interval of the regression coefficients, and the values of the models are predicted by PREDICT.GLM when the arguments are 400 and 700.

Use the plot function to make a model diagram.

Third, the use of hosmer-lemeshow goodness-of-fit test
The steps to construct the statistic are
1. Sorting the Fit values using the classification and fitting functions
2. The sorted values are divided into G groups, the value of G is generally selected 6-10
3. Find the number of observations and expectations for each group
4. The card-side goodness of fit test is performed on these groups.

The implementation code is
> Pass_hat <-fitted (pass_logistic)
> Hosmerlem <-function (y, yhat, g=10) {
+ cutyhat <-Cut (Yhat,breaks = Quantile (Yhat, Probs=seq (0,1, 1/g)), include.lowest=true)
+ Obs = Xtabs (Cbind (1-y, y) ~ cutyhat)
+ expect = Xtabs (Cbind (1-yhat, Yhat) ~ cutyhat)
+ CHISQ = SUM ((obs-expect) ^2/expect)
+ P = 1-PCHISQ (CHISQ, g-2)
+ RETURN (list (chisq=chisq,p.value=p))
+ }
> Hosmerlem (pass_logistic$y, Pass_hat)
First, the fitted function is used to extract the fitting value, and then the custom function is calculated

Residual plot of generalized linear model
The residuals of the generalized linear model are different from the residuals of the general linear model, but are similar in function
1. Response residuals
The difference between the true value and the fitted value
2. Abnormal residuals
For the first observation, the anomaly residual is the square root of the sum of anomalous observations in the model.
3. Pearson residuals
4. Local residuals
5. Woking residuals
The above residuals can be obtained using the residuals function

> Library (RSADBE)
> Data (Sat)
> pass_logistic <-glm (pass~sat,data=sat,family = ' binomial ')
> par (mfrow=c (1,3), Oma=c (0,0,3,0))
> Plot (Fitted (pass_logistic), residuals (pass_logistic, "response"), col= "Red", > xlab= "fitted Values", ylab= " Response residuals ")
> Points (Fitted (pass_probit), residuals (pass_probit, "response"), col= "green")
> Abline (h=0)
> Plot (Fitted (pass_logistic), residuals (pass_logistic, "deviance"), col= "Red", > xlab= "fitted Values", ylab= " Deviance residuals ")
> Points (Fitted (pass_probit), residuals (Pass_probit, "deviance"), col= "green")
> Abline (h=0)
> Plot (Fitted (pass_logistic), residuals (pass_logistic, "Pearson"), col= "Red", xlab= "fitted Values", ylab= "Pearson Residuals ")
> Points (Fitted (pass_probit), residuals (Pass_probit, "Pearson"), col= "green")
> Abline (h=0)
> title (main= "Response, deviance, and Pearson residuals Comparison for the Logistic and > Probit Models", outer=true)

The above code calculates the response residuals, abnormal residuals, and Pearson residuals, respectively, and graphs

The influence point and lever point of the generalized linear model
As with general linear models, generalized linear models also use Hatvalues, Cooks.distance, Dfbetas, dffits to calculate impact points and leverage points, but the judging criteria change
1.hatvalues value greater than 2 (p+1)/2, the observed value can be considered as a lever effect
2.Cooks distance is larger than 10% of the F-distribution, which can be considered to have an effect on the parameter estimation, which is considered to be a strong impact point if it exceeds the 50%-digit number.
The rule of thumb for 3.dfbetas, dffits, is that if the absolute value is greater than 1, the observations are considered to have an effect on the covariance

> hatvalues (pass_logistic)
> cooks.distance (pass_logistic)
> Dfbetas (pass_logistic)
> dffits (pass_logistic)
> Cbind (Hatvalues (pass_logistic), Cooks.distance (pass_logistic),
Dfbetas (pass_logistic), Dffits (pass_logistic))
> hatvalues (pass_logistic) >2* (Length (pass_logistic$coefficients)-1)
/length (pass_logistic$y)
> cooks.distance (pass_logistic) >qf (0.1,length (pass_logistic$coefficients),
Length (pass_logistic$y)-length (pass_logistic$coefficients))
> cooks.distance (pass_logistic) >qf (0.5,length (pass_logistic$coefficients),
Length (pass_logistic$y)-length (pass_logistic$coefficients))
> par (mfrow=c (1,3))
> Plot (Dfbetas (pass_logistic) [, 1],ylab= "dfbetas-intercept")
> Plot (Dfbetas (pass_logistic) [, 2],ylab= "Dfbetas-sat")
> Plot (dffits (pass_logistic), ylab= "Dffits")

Logistic regression analysis of R language

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More