Logistic regression model (Regression) and Python implementation

Last Update:2016-02-28 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Logistic regression model (Regression) and Python implementation

Http://www.cnblogs.com/sumai

1. Model

In classification problems, such as whether the message is spam, to determine whether the tumor is positive, the target variable is discrete, only two values, usually encoded as 0 and 1. Suppose we have a feature x that plots a scatter plot, and the results are as follows. At this time if we use linear regression to fit a line: hθ (X) =θ0+θ1x, if y≥0.5 is judged to be 1, otherwise 0. So we can also build a model to classify, but there are many shortcomings, such as poor robustness, low accuracy. and logistic regression is more appropriate for this kind of problem.

The logical regression hypothesis function is as follows, it makes a function G transform to θtx , maps to a range of 0 to 1, and the function g is called sigmoid function or logistic function, and the function image is as shown. When we enter features, the resulting hθ (x) is actually the probability value of this sample that belongs to the 1 classification. In other words, logistic regression is used to get the probability that a sample belongs to a classification.

2. Evaluation

Recall the loss function used in the previous linear regression:

If this loss function is also used in logistic regression, the obtained function j is a non-convex function, there are many local minimum values, it is difficult to solve, so we need to change the cost function. Redefine the cost function as follows:

When the actual sample belongs to the 1 category, if the predicted probability is also 1, then the loss is 0 and the prediction is correct. Conversely, if the forecast is 0, then the loss will be infinite. The loss function of this structure is reasonable, and it is a convex function, so it is very convenient to obtain the parameter θ, so that the loss function J reaches the minimum.

3. Optimization

We have defined the loss function J (θ), and the next task is to find the parameter θ. Our goal is clear: to find a set of θ, so that our loss function J (θ) is the smallest. There are two most commonly used solutions: Batch gradient descent (batch gradient descent), Newton iterative (Newton's method). Both methods are numerical solutions obtained by iteration, but the convergence speed of Newton iterative method is faster.

Batch Gradient descent method:

Newton Iterative Method: (H is the heather matrix)

4.python Code Implementation

1 #-*-coding:utf-8-*-2 """3 Created on Wed Feb 11:04:114 5 @author: Sumaiwong6 """7 8 ImportNumPy as NP9 ImportPandas as PDTen  fromNumPyImportDot One  fromNumpy.linalgImportINV A  -Iris = Pd.read_csv ('D:\iris.csv') -Dummy = pd.get_dummies (iris['species'])#creating dummy variables for species theIris = Pd.concat ([Iris, dummy], axis =1 ) -Iris = iris.iloc[0:100,:]#intercept the first 100 rows of samples -  - #build a logistic Regression to classify whether the species is Setosa setosa ~ sepal.length + #Y = g (BX) = 1/(1+exp (-BX)) - deflogit (x): +     return1./(1+np.exp (-x)) A  attemp =PD. DataFrame (iris.iloc[:, 0]) -temp['x0'] = 1. -X = temp.iloc[:,[1, 0]] -Y = iris['Setosa'].reshape (len (Iris), 1)#sorting out the X-matrix and y-matrices -  - #Batch Gradient descent method inM,n = X.shape#Matrix Size -Alpha = 0.0065#Set the learning rate toTheta_g = Np.zeros ((n,1))#Initialize Parameters +Maxcycles = 3000#Number of iterations -J = PD. Series (Np.arange (maxcycles, dtype = float))#loss Function the  *  forIinchRange (maxcycles): $h = logit (dot (X, theta_g))#Estimated ValuePanax NotoginsengJ[i] =-(1/100.) *np.sum (Y*np.log (h) + (1-y) *np.log (1-h))#Calculate loss function Values -Error = H-y#Error theGrad = dot (x.t, error)#Gradient +Theta_g-= Alpha *Grad A PrintTheta_g the PrintJ.plot () +  - #Newton Method $Theta_n = Np.zeros ((n,1))#Initialize Parameters $Maxcycles = 10#Number of iterations -C = PD. Series (Np.arange (maxcycles, dtype = float))#loss Function -  forIinchRange (maxcycles): theh = logit (dot (X, theta_n))#Estimated Value -C[i] =-(1/100.) *np.sum (Y*np.log (h) + (1-y) *np.log (1-h))#Calculate loss function ValuesWuyiError = H-y#Error theGrad = dot (x.t, error)#Gradient -A = h* (1-h) *Np.eye (len (X)) WuH = Np.mat (x.t) * A * Np.mat (X)#Heather Matrix, H = X ' AX -Theta_n-= INV (H) *Grad About PrintTheta_n $ PrintC.plot ()

Data used by the code: Http://files.cnblogs.com/files/sumai/iris.rar

Logistic regression model (Regression) and Python implementation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Logistic regression model (Regression) and Python implementation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Logistic regression model (Regression) and Python implementation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support