In theory, regression analysis is modeled in the case where the target variable is continuous data, and it cannot handle the situation where the target variable is classified data.
Logic regression analysis of the idea is to classify variables ("open VIP") into a continuous variable ("Open VIP probability"), and then use the method of regression analysis to indirectly study the problem of classification analysis.
First, the principle
Assuming that the VIP variable is a categorical variable, it takes a value of only 0 and 1, which is a type variable that cannot be modeled by regression analysis.
However, the probability of a VIP value of 1 is a continuous variable (PROB.VIP), which can be modeled using regression analysis for PROB.VIP:
Prob.vip=k1*x1+k2*x2+k3*x3+k4*x4+b
Since the value range of the k1*x1+k2*x2+k3*x3+k4*x4+b is (-∞,+∞), and the PROB.VIP range is [0,1], the conversion is performed using the y=1/(-X) function:
PROB.VIP=1/(1+exp (-(k1*x1+k2*x2+k3*x3+k4*x4+b)))
When prob.vip>0.5, the vip.predict=1 can be predicted, otherwise 0.
Note: Regression analysis uses least squares to fit model parameters, and logic regression uses the maximum likelihood method to estimate.
Ii. implementation of the R language
GLM () is the core function for logic regression analysis using R language.
Parameters:
Formula: Setting the form of a linear fit model
FAMILY:GLM's algorithm family. Logic regression analysis, family set to binomial ("logit")
Data: Samples
Code:
(1) Building a logic regression model
DATA.GLM<- glm (VIP~., Data=vip.data,family=binomial ("logit")) Summary ( DATA.GLM) The model can be modified using the step function: DATA.GLM<- Step (DATA.GLM)
(2) Output items of model GLM
Model parameters: Data.glm$coefficients
Predictive data for linear models: Data.glm$linear.predictors
The VIP equals 1 probability prob.vip:data.glm$fitted.values
Residuals for linear fit models: Data.glm$residuals
(3) Model prediction
Predictive test data:
PREDICT.VIP <-IfElse (data.glm$fitted.values>= 0.5,1,0)
PREDICT.VIP <-As.factor (PREDICT.VIP)
Predict new data:
new.predict.vip<-Predict (Data.glm,newdata=test.vip.data) predictive values for linear fit data
new.predict.vip<-1/(1+exp (-NEW.PREDICT.VIP)) probability value
new.predict.vip<-As.factor (IfElse (new.predict.vip>= 0.5,1,0)) Predict final value
(4) Model performance measurement
performance<-Length (which ((PREDICT.VIP==VIP.DATA$VIP) ==true))/nrow (vip.data) correct rate
where length (which (PREDICT.VIP==VIP.DATA$VIP) ==true) represents the number of values that the predicted value is equal to the actual sample value element.
Logic regression analysis of R language