R Linguistic Data Analysis series nine-Logistic regression

Last Update:2015-04-15 Source: Internet

Author: User

Tags ggplot

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

R Language Data Analysis series nine

--by Comaple.zhang

In this section, logical regression and R language implementations, logistic regression (lr,logisticregression) is actually a generalized regression model, according to the types of dependent variables and the distribution can be divided into the common multivariate linear regression model, and logistic regression, the logistic regression is that the dependent variable is discrete and the value range is { 0,1} Two classes, if the discrete variable value is a multi-item that becomes MULTI-CLASS classification, so the LR model is a two classification model, can be used to do CTR prediction and so on. So let's get back to the logical regression how to do two classification problems.

Problem Introduction

In multivariate linear regression, our model formulas are like this (refer to the first two sections),

Here f (x,w) is a continuous variable, if our dependent variable is discrete, how to deal with it, for example, we have data like this.

x <-seq ( -3,3,by=0.01)

Y <-1/(1+exp (-X))

GDF <-Data.frame (x=x,y=y)

Ggplot (Gdf,aes (x=x,y=x+0.5)) +geom_line (col= ' green ')

This obviously could not fit our {0,1} output, in order to be able to fit the discrete {0,1} output we introduced the sigmoid function as follows:

Ggplot (Gdf,aes (x=x,y=y)) +geom_line (col= ' blue ') + geom_vline (xintercept=c (0), col= ' red ') + Geom_hline (Yintercept=c ( 0,1), lty=2)

Use R to draw the row of the function as follows:

Again, we can easily convert a linear relationship to a discrete {0,1} output.

Ggplot (Gdf,aes (x=x,y=y)) +geom_line (col= ' blue ') + geom_vline (xintercept=c (0), col= ' red ') + Geom_hline (Yintercept=c ( 0,1), lty=2) +geom_line (Aes (x=x,y=x+0.5), col= ' green ')

So our class probabilities can be expressed as:

Thus our transformation is complete, and the model is finally reduced to the following form:

Loss function of LR (cost functions)

The above leads to the Sigomid function and use it for our model, how to define the loss function, can not do subtraction it, in addition to do subtraction we can also how to do, for the model of discrete variables, we hope to achieve, each classification of the correct number of the more the better, that the model joint probability density maximum:

That is, we want to maximize L (W), in order to optimize the maximum value of L (w), we go to the negative logarithm likelihood of L (W), thereby converting the maximization problem into a minimization problem:

Next, we will optimize the loss function to the smallest of the L (W) groups W.

The optimization method has, Newton method, gradient descent, L-bfgs here no longer detailed in these methods, the rest of the series will be mentioned.

The implementation of LR in R language

We use the iris dataset to perform a logistic regression two classification test, which is a data set from the R language, including my Zodiac, and three classifications. Logistic regression we implemented with the GLM function, which provides various types of regression, such as: Provide normal, exponential, gamma, inverse Gaussian, Poisson, two items. The logistic regression we used was a two-item distribution family binomial.

Index <-which (iris$species = = ' Setosa ')

IR <-iris[-Index,]

Levels (Ir$species) [1] <-"

Split <-sample (100,100* (2/3))

#生成训练集

Ir_train <-Ir[split,]

#生成测试集

Ir_test <-Ir[-split,]

Fit <-GLM (species ~.,family=binomial (link= ' logit '), Data=ir_train)

Summary (FIT)

Real <-Ir_test$species

Data.frame (real,predict)

Predict <-Predict (fit,type= ' response ', newdata=ir_test)

Res <-data.frame (real,predict =ifelse (predict>0.5, ' Virginca ', ' Versicorlor '))

#查看模型效果

Plot (RES)

R Linguistic Data Analysis series nine-Logistic regression

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More