The most detailed logistic regression (Logistic regression) source code based on the R language, including fit optimization, recall, precision computing

Source: Internet
Author: User

This log is indeed a trigger. I am not familiar with R, but it is required by the experiment, so I just learned it. We found that, whether it's countless tutorials on the Internet or examples in books, when talking about logistic regression, we will give a simple function and a description of the output results. I have never been clear about several things:

1. How to Use training data to train the model and then verify the test data (the test data and training data may overlap )?

2. How to calculate the prediction result, that is, calculate the recall, precision, and F-measure values?

3. How to calculate evaluation indicators such as Nagelkerke goodness of fit?

I found these books and some blog-writing friends with an unclear mind. Let's look at your tutorial. Instead of simply looking at the use of simple functions, or listening to you to explain the principles, we still hope to use them as soon as possible and correctly. From my experience, the existing online tutorials are too poor.

I will not describe the process in detail here. I believe you will understand it at a Glance:

Train ("training.csv", header?false=testing=read.csv ("testing.csv", header = false) # import training and test data respectively GLM. Fit = GLM (V16 ~ V7, Data = training, family = binomial (link = "Logit") # generate a model using training data. Here I Use 7th columns of data to predict 16th columns. n = nrow (training) # Number of training data rows, that is, the number of samples R2 <-1-exp (GLM. fit $ deviance-glm.fit $ null. deviance)/n) # Calculate Cox-Snell goodness of fit CAT ("Cox-Snell r2 =", R2, "\ n ") r2 <-R2/(1-exp (-GLM. fit $ null. deviance)/n) # Calculate the Nagelkerke goodness of fit. At the end, we output this goodness of fit value p = predict (GLM. fit, testing) # use a model to predict the test data. P = exp (P)/(1 + exp (p )) # calculate the value of the dependent variable testing $ v16_predicted = 1 * (P> 0.5) # Add a column to the test data, that is, the prediction of V16. When P> 0.5, predicted value: 1 true_value = testing [, 16] predict_value = testing [, 17] # retrieve 16 and 17 columns respectively retrieved = sum (predict_value) precision = sum (true_value & predict_value) /retrievedrecall = sum (predict_value & true_value)/sum (true_value) f_measure = 2 * precision * recall/(precision + recall) # Calculate recall, precision, and F-measure Summary (GLM. fit) CAT ("Nagelkerke R2 =", R2, "\ n") print (precision) print (recall) print (f_measure)

I don't know why many people are confused about such a simple thing.

Here we will briefly explain the output result of summary:

Call:glm(formula = V16 ~ V7, family = binomial(link = "logit"), data = training) Deviance Residuals:    Min       1Q   Median       3Q      Max-2.5212  -0.9990  -0.4249   1.1352   1.4978   Coefficients:             Estimate Std. Error z value Pr(>|z|)(Intercept) -0.744804   0.207488  -3.590 0.000331 ***V7           0.005757   0.001362   4.226 2.38e-05 ***---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1)     Null deviance: 307.76  on 221  degrees of freedomResidual deviance: 277.85  on 220  degrees of freedomAIC: 281.85 Number of Fisher Scoring iterations: 5

In fact, you can see coefficient here. estimate indicates the coefficient of V7 in the final prediction equation, and PR is P-value. The prediction results are acceptable from these two points.

The most detailed logistic regression (Logistic regression) source code based on the R language, including fit optimization, recall, precision computing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.