R Linguistic Data Analysis series nine-Logistic regression

Source: Internet
Author: User
Tags ggplot

R Language Data Analysis series nine

--by Comaple.zhang

In this section, logical regression and R language implementations, logistic regression (lr,logisticregression) is actually a generalized regression model, according to the types of dependent variables and the distribution can be divided into the common multivariate linear regression model, and logistic regression, the logistic regression is that the dependent variable is discrete and the value range is { 0,1} Two classes, if the discrete variable value is a multi-item that becomes MULTI-CLASS classification, so the LR model is a two classification model, can be used to do CTR prediction and so on. So let's get back to the logical regression how to do two classification problems.

Problem Introduction

In multivariate linear regression, our model formulas are like this (refer to the first two sections),


Here f (x,w) is a continuous variable, if our dependent variable is discrete, how to deal with it, for example, we have data like this.

x <-seq ( -3,3,by=0.01)

Y <-1/(1+exp (-X))

GDF <-Data.frame (x=x,y=y)

Ggplot (Gdf,aes (x=x,y=x+0.5)) +geom_line (col= ' green ')

This obviously could not fit our {0,1} output, in order to be able to fit the discrete {0,1} output we introduced the sigmoid function as follows:

Ggplot (Gdf,aes (x=x,y=y)) +geom_line (col= ' blue ') + geom_vline (xintercept=c (0), col= ' red ') + Geom_hline (Yintercept=c ( 0,1), lty=2)

Use R to draw the row of the function as follows:


Again, we can easily convert a linear relationship to a discrete {0,1} output.

Ggplot (Gdf,aes (x=x,y=y)) +geom_line (col= ' blue ') + geom_vline (xintercept=c (0), col= ' red ') + Geom_hline (Yintercept=c ( 0,1), lty=2) +geom_line (Aes (x=x,y=x+0.5), col= ' green ')



So our class probabilities can be expressed as:


Thus our transformation is complete, and the model is finally reduced to the following form:

Loss function of LR (cost functions)

The above leads to the Sigomid function and use it for our model, how to define the loss function, can not do subtraction it, in addition to do subtraction we can also how to do, for the model of discrete variables, we hope to achieve, each classification of the correct number of the more the better, that the model joint probability density maximum:


That is, we want to maximize L (W), in order to optimize the maximum value of L (w), we go to the negative logarithm likelihood of L (W), thereby converting the maximization problem into a minimization problem:


Next, we will optimize the loss function to the smallest of the L (W) groups W.

The optimization method has, Newton method, gradient descent, L-bfgs here no longer detailed in these methods, the rest of the series will be mentioned.

The implementation of LR in R language

We use the iris dataset to perform a logistic regression two classification test, which is a data set from the R language, including my Zodiac, and three classifications. Logistic regression we implemented with the GLM function, which provides various types of regression, such as: Provide normal, exponential, gamma, inverse Gaussian, Poisson, two items. The logistic regression we used was a two-item distribution family binomial.

Index <-which (iris$species = = ' Setosa ')

IR <-iris[-Index,]

Levels (Ir$species) [1] <-"

Split <-sample (100,100* (2/3))

#生成训练集

Ir_train <-Ir[split,]

#生成测试集

Ir_test <-Ir[-split,]

Fit <-GLM (species ~.,family=binomial (link= ' logit '), Data=ir_train)

Summary (FIT)

Real <-Ir_test$species

Data.frame (real,predict)

Predict <-Predict (fit,type= ' response ', newdata=ir_test)

Res <-data.frame (real,predict =ifelse (predict>0.5, ' Virginca ', ' Versicorlor '))

#查看模型效果

Plot (RES)


R Linguistic Data Analysis series nine-Logistic regression

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.