Understanding the principle of logistic regression algorithm and Python implementation

Source: Internet
Author: User

generally, the implementation of machine learning is basically such a step:
1. Preparation of data, including data collection, collation, etc.
2. Define a learning model (learning function model), which is the last model to use to predict other data.
3. Define the loss function (the loss function), which is the function that you want to optimize to determine the parameters in the model.
4. Select an optimization strategy (optimizer) to continuously optimize the parameters of the model according to the loss function.
5. Based on training data (train data) training model (learning function model)
6. The accuracy rate of model prediction is obtained based on test data.

The logistic regression also follows this step, one of the steps above, five, six nature is needless to say, the remaining logistic regression algorithm differs from other machine learning algorithms only in the second step-learning model selection . So the following is the main explanation for the logistic regression to determine what kind of model, and then simply say the loss function and optimization strategy.
First, a brief introduction to logistic regression:logistic regression is simply a weighted addition of the feature (feature) to the sigmoid function, after the output of the sigmoid function to determine the results of the two classification. so the advantage of logistic regression is that the computational cost is not high, easy to understand and realize. The disadvantage is that it is very easy to fit, the accuracy of classification is not high. Another important place is that a neuron in a neural network can actually be understood as a logistic regression model. sigmoid function

First of all, the sigmoid function, because it plays an important role in logistic regression, sigmoid function is a common S-type function in biology, also known as the S-shaped growth curve. The sigmoid function is defined by the following formula:

Here are the sigmoid function graphs at two scales:

As you can see, when the horizontal axis span is large enough, thesigmoid function looks like a step function.
In addition, the sigmoid function also has the following features:
a good threshold function (the maximum is approaching 1, the minimum is approaching 0), continuous, smooth, strictly monotonous, and about (0,0.5) center symmetry. Logistic regression model

Logistic regression in order to solve the two classification problem, we need one such function: the input of the function should be from negative infinity to positive infinity, and the output of the function is 0 or 1. Such a function can easily be reminiscent of a unit step function:

That's the thing on the top, but the unit step function jumps from 0 to 1 at the jump point, which determines that it's not continuous, so it's not the best choice, and the logistic regression finally chooses the sigmoid function mentioned above. since the sigmoid function is symmetric at 0.5 centers, the problem of 0 classification is achieved by simply using the data of its output greater than 0.5 as "Class 1", less than 0.5 of the data as "two classes" .
The problem of how to deal with the output of the sigmoid function is determined, then what is the input of the sigmoid function.
In fact, each feature (feature) is multiplied by a regression coefficient, and then all the result values are added and the sigmoid function is defined as Z, then:

These are the features, and the parameters that need to be trained. We all express it in the form of vectors:

So the logistic regression model can be written in the form of:

Thus, the logistic regression model is determined:
loss function and optimization strategy

In the logistic regression model shown in the above figure, after activating the function, the output of the threshold function is used for the training model, which is set here to be a probability (since all output is compressed to 0-1), so it is sometimes expressed in P. But for a model, his loss function is not unique, for different loss function can choose different optimization strategy.

The loss function is a measure of the distance between the actual output and the desired output, in general, it can be constructed in the form of square squares, but this approach is often not used in logistic regression, because in this way, the optimization objective is not a convex function, and when the gradient descent algorithm is used, multiple optimal solutions may be found. Therefore, in logistic regression, the loss function defined below is generally used.

We assume that the probability of y=1 is that, because it's a two classification problem, the probability of y=0 is that we'll take the logarithm and multiply it by Y, and then add up all the samples:

We hope that the logistic regression learning gets the parameter, can make the characteristic of the positive example far more than 0, the negative case characteristic is far less than 0, therefore the optimization question also becomes how to ask:

Or:

The previous formula can also be called the loss function of logistic regression (loss function).

Then for the first loss function, the (random) gradient rise can be used as the optimization strategy, and for the second loss function, it is better to drop the gradient (random). Python Implementation

This example comes from "machine learning combat", where you can download the electronic version.
This example uses the logistic regression and the stochastic gradient rise algorithm to predict the life and death of the diseased horse, the following will post the source code and a simple description, but if you want to use the data in the routine, you can download the entire routine.

From numpy import * def loaddataset (): Datamat = [];
        Labelmat = [] fr = open (' testSet.txt ') for line in Fr.readlines (): Linearr = Line.strip (). Split () Datamat.append ([1.0, Float (linearr[0]), float (linearr[1)]) labelmat.append (int (linearr[2))) return Datamat,la Belmat def sigmoid (InX): Return 1.0/(1+exp (-inx)) def gradascent (Datamatin, classlabels): Datamatrix = Mat (Datam Atin) #convert to numpy matrix Labelmat = Mat (Classlabels). Transpose () #convert to numpy matrix M,n =              Shape (datamatrix) Alpha = 0.001 maxcycles = weights = Ones ((n,1)) for K in range (Maxcycles):              #heavy on matrix Operations h = sigmoid (datamatrix*weights) #matrix mult error = (labelmat-h) #vector Subtraction weights = weights + Alpha * datamatrix.transpose () * ERROR #matrix mult return W Eights def plotbestfit (weights): Import Matplotlib.pyplot as Plt Datamat,labelmAt=loaddataset () Dataarr = Array (datamat) n = shape (Dataarr) [0] xcord1 = []; Ycord1 = [] Xcord2 = []; Ycord2 = [] for i in range (n): If int (labelmat[i]) = = = 1:xcord1.append (dataarr[i,1]); Ycord1.append (dataarr[i,2]) else:xcord2.append (dataarr[i,1]); Ycord2.append (dataarr[i,2]) FIG = plt.figure () ax = Fig.add_subplot (a) ax.scatter (Xcord1, Ycord1, s=30, c= ' r Ed ', marker= ' s ') Ax.scatter (Xcord2, Ycord2, s=30, c= ' green ') x = Arange ( -3.0, 3.0, 0.1) y = (-weights[0]-weigh ts[1]*x)/weights[2] Ax.plot (x, y) plt.xlabel (' X1 ');
    Plt.ylabel (' X2 '); Plt.show () def stocGradAscent0 (Datamatrix, classlabels): M,n = shape (datamatrix) Alpha = 0.01 weights = ones ( N) #initialize to all ones for I in range (m): h = sigmoid (sum (datamatrix[i]*weights)) error = Classl Abels[i]-H weights = weights + Alpha * ERROR * Datamatrix[i] return weights def stocGradAscent1 (DatamatrIX, Classlabels, numiter=150): M,n = shape (datamatrix) weights = Ones (n) #initialize to all ones with J in Ra Nge (numiter): Dataindex = Range (m) for I in Range (m): Alpha = 4/(1.0+j+i) +0.0001 #apha DECR Eases with iteration, does isn't randindex = Int (Random.uniform (0,len)) #go to 0 Dataindex of the const
            Ant h = sigmoid (sum (datamatrix[randindex]*weights)) error = Classlabels[randindex]-H weights = weights + Alpha * ERROR * Datamatrix[randindex] del (dataindex[randindex)) return weights def C Lassifyvector (InX, weights): prob = sigmoid (sum (inx*weights)) if prob > 0.5:return 1.0 else:return 0.0 D EF colictest (): Frtrain = open (' HorseColicTraining.txt '); frtest = open (' horseColicTest.txt ') trainingset = [];
        Traininglabels = [] for line in Frtrain.readlines (): Currline = Line.strip (). Split (' \ t ') Linearr =[] For I in range (21):
            Linearr.append (float (currline[i)) trainingset.append (Linearr) traininglabels.append (float (CU RRLINE[21]) Trainweights = stocGradAscent1 (Array (trainingset), traininglabels, 1000) errorcount = 0; 
        Numtestvec = 0.0 for line in Frtest.readlines (): Numtestvec + = 1.0 Currline = Line.strip (). Split (' t ') Linearr =[] for I in range: Linearr.append (float (currline[i)) if Int (Classifyvec Tor (Array (Linearr), trainweights))!= Int (currline[21]): Errorcount = 1 errorrate = (float (errorcount)/num Testvec) print "The error rate of this test are:%f"% errorrate return errorrate def multitest (): Numtests = 10; errorsum=0.0 for K in range (numtests): Errorsum + = Colictest () print "After%d iterations the average Erro
 R Rate is:%f "% (numtests, errorsum/float (numtests))

The functions defined in the function are as follows:
Loaddataset (): Data preparation;
Sigmoid (): Define the sigmoid function;
Gradascent (): Gradient rise algorithm;
Plotbestfit (): Draw the decision-making boundary;
StocGradAscent0 (): Stochastic gradient ascending algorithm;
StocGradAscent1 (): An improved stochastic gradient rise algorithm;
Classifyvector (): A regression coefficient and eigenvector are used as input to compute the corresponding sigmoid value;
Colictest (): Open the test set and training set, and format the data processing;
Multitest (): Calls Colictest () 10 times and evaluates the average of the results.

So after running the above code, enter it in the Python prompt window:

>>> Import logregres
>>> reload (logregres)
<module ' logregres ' from ' f:\ learning materials \ Machine learning and computer vision data \ "Machine learning Combat" ebook and Source code \MACHINELEARNINGINACTION\CH05\LOGREGRES.PYC ' >
>>> Logregres.multitest ()

Final program output:

The error rate of this test is:0.358209 the error rate to this
test is:0.283582 the
error rate to this test is: 0.298507 the
error rate of this test is:0.417910 the
error rate of this test is:0.388060 the
error rate of T His Test is:0.298507 the
error rate to this test is:0.328358 the error rate the ' This
test is:0.313433 the
er ROR rate of this test is:0.402985 the error rate ' this '
test is:0.432836 after
iterations the average error Rate is:0.352239

0.352239 is the final error rate.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.