I was excited when I saw this part of the content, because it was finally linked to the theoretical content of my previous studies, which is part of the code implementation of the previous logistic regression theory, so if something that is not quite understood can be returned to the theoretical part to understand, Below we enter the topic----Logistic regression

**First, sigmoid function**

In the previous theoretical part we know that if we need to classify something two, then we want the value of the output function to be in the interval [0,1], so we introduce the sigmoid function. The function is in the form of.

Graph

According to the function expression, we can use code to represent

def sigmoid (INX): return 1.0/(1+exp (-inx))

**Second, gradient rise method**

The method used here is the gradient rise method. In fact, the gradient rise method is the same as the gradient descent method, but it is a different form of expression. The gradient rise method is expressed as a formula for the gradient descent method. Although there seems to be only one operator here, in fact there are some small differences in F (x,θ) here, which I'll explain in a later code.

The first thing is to get the feature datasets and tags from the text file, which has been explained several times before, and this section gives the code directly. With the feature datasets and labels, we can find the best parameters using gradient rise and gradient descent methods.

defloaddataset (filename): Datamat=[] Labelsvec=[] File= Open (R'Logregres\testset.txt') forLineinchfile.readlines (): Linearr=Line.strip (). Split () datamat.append ([1.0,float (linearr[0]), float (linearr[1])]) labelsvec.append (int (linearr[-1])) returnDatamat,labelsvecdefGradascent (datasetin,classlabels):#Gradient Rise MethodDatamat =Mat (datasetin) Labelsvec=Mat (Classlabels). Transpose () M,n=shape (datamat) Sigma= Ones ((n,1) ) Alpha= 0.001Maxcycles= 500 forIinchRange (maxcycles): H= Sigmoid (datamat*Sigma) Error= (labelsvec-h) Sigma= Sigma + alpha*datamat.transpose () *ErrorreturnSigmadefGradAscent1 (datasetin,classlabels):#Gradient Descent MethodDatamat =Mat (datasetin) Labelsvec=Mat (Classlabels). Transpose () M,n=shape (datamat) Sigma= Zeros ((n,1) ) Alpha= 0.001Maxcycles= 500 forIinchRange (maxcycles): H= Sigmoid (datamat*Sigma) Error= (H-Labelsvec) Sigma= Sigma-alpha*datamat.transpose () *ErrorreturnSigma

The. Transpose () method here is to transpose the matrix and maxcycles the number of iterations. The difference between the gradient descent method and the F (x,θ) of the gradient rise method above is the error error, and the error of both methods is opposite to each other. In fact, as long as the gradient descent method of the error into Sigma = Sigma-alpha*datamat.transpose () *error can be found that the negative sign of the error is the same as the gradient rise method of the same formula. Here's the parameter Sigma update we omitted the derivation process, unfamiliar can go before the theoretical part of the look. With the best parameters, we can draw a decision boundary to see if we can fit the data set perfectly.

defPlotbestfit (Sigma):#Sigma is passed in as a numpy arrayDatamat,labelsvec = Loaddataset (r'Logregres\testset.txt') Arrdatamat=Array (datamat) n=shape (Arrdatamat) [0]#Sigmat = gradascent (Datamat,labelsvec) #sigma = Sigmat.geta ()xcord1=[];ycord1=[] Xcord2=[];ycord2=[] forIinchrange (n):ifLabelsvec[i]==1: Xcord1.append (Arrdatamat[i,1]) ycord1.append (Arrdatamat[i,2]) Else: Xcord2.append (Arrdatamat[i,1]) ycord2.append (Arrdatamat[i,2]) FIG=plt.figure () Ax= Plt.subplot (111) Ax.scatter (xcord1,ycord1,s=30,c ='Red', marker ='s') Ax.scatter (xcord2,ycord2,s=30,c ='Blue') x= Arange ( -3.0,3.0,0.1) y= (-sigma[0]-sigma[1]*x)/sigma[2] Ax.plot (x, y) plt.show ()

This function calls the previous loaddataset () function, which needs to be modified to the path of the text file on its own computer. Then we display the two categories 0 and 1 in the dataset separately so that we can see more clearly, then we draw the decision boundaries. Y= (-sigma[0]-sigma[1]*x)/sigma[2] How did this expression come about? Our eigenvector x = [X0,x1,x2],x0=1, the x in the expression is equivalent to x1,y equivalent to x2, so we use this formula to represent the decision boundary. The results are shown below

Here only the gradient rise method of the result graph, gradient descent method to get the results of the same, interested can be drawn out to try. The results of the classification can be seen in the figure is pretty good. However, there is a problem with this method, that is, the computational complexity is high, for hundreds of samples can, but if there are millions of samples? Below we will optimize the algorithm above.

**Iii. Random Gradient Rise method**

The principle of the stochastic gradient rise method is that the regression coefficients are updated with only one sample point at a time. The code is as follows

def stocGradAscent0 (Datamat,labelsvec): = Array (datamat) = shape (arrdatamat) = ones (n) = 0.01 for inch Range (m): = sigmoid (sum (arrdatamat[i]*sigma)) = float (labelsvec[i]- h) = sigma+alpha*error* Arrdatamat[i] return Sigma

As we can see, the sigmoid function is no longer the entire feature data set, but just a sample point, so that the computational amount is much smaller than the previous method. Let's take a look at the effect of this.

The red section has nearly half the data that is generated by dividing the error. Isn't that a bad idea? Of course not, we've only worked on this one, we haven't started the iteration yet, and we've iterated through the previous methods 500 times, so this doesn't see the effect of the optimized method. Here we will modify the code, 150 iterations to see the effect

defStocGradAscent1 (Datamat,labelsvec,numiter = 150): Arrdatamat=Array (datamat) M,n=shape (arrdatamat) Sigma=ones (n) forIinchRange (numiter): forJinchRange (m): Alpha= 1.0/(1+i+j) +0.01Randindex=Int (random.uniform (0,m)) H= sigmoid (SUM (arrdatamat[randindex]*Sigma)) Error= labelsvec[randindex]-H Sigma= sigma+alpha*error*Arrdatamat[randindex]returnSigma

Here the third default parameter numiter is the number of iterations, the default is 150, we can specify by ourselves. The alpha in the code changes and decreases with the number of iterations, but not to 0, because there is a constant term that is guaranteed to have a certain effect on the new data after multiple iterations. The sample points here also need to be randomly selected by the computer to reduce cyclical fluctuations. Let's see the results.

Compared with the results before optimization, the effect is good, but the computational complexity here is much smaller.

**Iv. Practical Exercises**

Here are some text file datasets, with 30% missing data. After preprocessing the data, we replace the feature with the real number, and the missing label we choose to discard the data. We need to extract the feature datasets and tags from the training set data to train the model, and then test the training effect with the test data set. Of course, given the test here we use the test Data feature set and training model to find the best regression coefficients multiplied and into the sigmoid function, our previous theory said, as long as the sigmoid function value is greater than 0.5, we can consider its classification is 1, otherwise considered to be classified as 0.

defClassifyvector (inx,sigma): Prob= sigmoid (SUM (inx*Sigma)) ifprob>0.5: return1.0Else: return0.0defcolictest (): Trainfile= Open (R'D:\ipython\logRegres\horseColicTraining.txt') Testfile= Open (R'D:\ipython\logRegres\horseColicTest.txt') trainset= []; Trainlabels = [] forLineinchtrainfile.readlines (): line_s= Line.strip (). Split ('\ t') Linearr= [] forIinchRange (21): Linearr.append (float (line_s[i)) trainset.append (Linearr) trainlabels.append (float (line_s[ -1])) Sigma= StocGradAscent1 (trainset,trainlabels,500) error_cnt=0.0;numtestvec =0 forLine1inchtestfile.readlines (): Numtestvec+=1line_s1= Line1.strip (). Split ('\ t') lineArr1= [] forJinchRange (21): Linearr1.append (float (line_s1[j)))ifInt (classifyvector (Array (LINEARR1), sigma))!=int (line_s1[-1]): error_cnt+=1error_rate= Float (error_cnt)/NumtestvecPrint('The num of error is%d,the error rate is%f\n'%(error_cnt,error_rate))returnerror_ratedefmultitest (): Numtests= 10Error_sum= 0.0 forIinchRange (numtests): Error_sum+=colictest ()Print('%d iterations The average error rate is%f\n'% (Numtests,float (error_sum)/numtests))

The last function is to perform 10 tests and take an average. One of the things that must be noted here is that Str is separated by the. Split () method, that is, the form of ' 2.0 ', ' 1.0 ', which is not a number, and of course the predicted classification (number) in the test is not equal to the label (STR). So be sure to note that if you also produce the results of the error_rate=1, you can see if it was the same mistake I made!

There are 34.3% error rates in the predictions here, and this is actually not a bad result, as we use 30% of the data in the data set that is missing.

**Data set and code download HTTP://PAN.BAIDU.COM/S/1QYHT3I8**

Machine learning python practical----Logistic regression