The most detailed logistic regression (Logistic regression) source code based on the R language, including fit optimization, recall, precision computing

Last Update:2014-09-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This log is indeed a trigger. I am not familiar with R, but it is required by the experiment, so I just learned it. We found that, whether it's countless tutorials on the Internet or examples in books, when talking about logistic regression, we will give a simple function and a description of the output results. I have never been clear about several things:

1. How to Use training data to train the model and then verify the test data (the test data and training data may overlap )?

2. How to calculate the prediction result, that is, calculate the recall, precision, and F-measure values?

3. How to calculate evaluation indicators such as Nagelkerke goodness of fit?

I found these books and some blog-writing friends with an unclear mind. Let's look at your tutorial. Instead of simply looking at the use of simple functions, or listening to you to explain the principles, we still hope to use them as soon as possible and correctly. From my experience, the existing online tutorials are too poor.

I will not describe the process in detail here. I believe you will understand it at a Glance:

Train ("training.csv", header?false=testing=read.csv ("testing.csv", header = false) # import training and test data respectively GLM. Fit = GLM (V16 ~ V7, Data = training, family = binomial (link = "Logit") # generate a model using training data. Here I Use 7th columns of data to predict 16th columns. n = nrow (training) # Number of training data rows, that is, the number of samples R2 <-1-exp (GLM. fit $ deviance-glm.fit $ null. deviance)/n) # Calculate Cox-Snell goodness of fit CAT ("Cox-Snell r2 =", R2, "\ n ") r2 <-R2/(1-exp (-GLM. fit $ null. deviance)/n) # Calculate the Nagelkerke goodness of fit. At the end, we output this goodness of fit value p = predict (GLM. fit, testing) # use a model to predict the test data. P = exp (P)/(1 + exp (p )) # calculate the value of the dependent variable testing $ v16_predicted = 1 * (P> 0.5) # Add a column to the test data, that is, the prediction of V16. When P> 0.5, predicted value: 1 true_value = testing [, 16] predict_value = testing [, 17] # retrieve 16 and 17 columns respectively retrieved = sum (predict_value) precision = sum (true_value & predict_value) /retrievedrecall = sum (predict_value & true_value)/sum (true_value) f_measure = 2 * precision * recall/(precision + recall) # Calculate recall, precision, and F-measure Summary (GLM. fit) CAT ("Nagelkerke R2 =", R2, "\ n") print (precision) print (recall) print (f_measure)

I don't know why many people are confused about such a simple thing.

Here we will briefly explain the output result of summary:

Call:glm(formula = V16 ~ V7, family = binomial(link = "logit"), data = training) Deviance Residuals:    Min       1Q   Median       3Q      Max-2.5212  -0.9990  -0.4249   1.1352   1.4978   Coefficients:             Estimate Std. Error z value Pr(>|z|)(Intercept) -0.744804   0.207488  -3.590 0.000331 ***V7           0.005757   0.001362   4.226 2.38e-05 ***---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1)     Null deviance: 307.76  on 221  degrees of freedomResidual deviance: 277.85  on 220  degrees of freedomAIC: 281.85 Number of Fisher Scoring iterations: 5

In fact, you can see coefficient here. estimate indicates the coefficient of V7 in the final prediction equation, and PR is P-value. The prediction results are acceptable from these two points.

The most detailed logistic regression (Logistic regression) source code based on the R language, including fit optimization, recall, precision computing

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The most detailed logistic regression (Logistic regression) source code based on the R language, including fit optimization, recall, precision computing

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The most detailed logistic regression (Logistic regression) source code based on the R language, including fit optimization, recall, precision computing

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support