Learning in the field of machine learning notes: Logistic regression & predicting mortality of hernia disease syndrome

Source: Internet
Author: User

Objective:

In life, people often encounter various optimization problems, such as how to get from one location to another in the shortest time. How can you get the most benefit from the least amount of money you have invested? How to design a chip so that it consumes the lowest power and the best performance? In this section, we will learn an optimization algorithm--logistic regression, the purpose of design optimization algorithm is still used for classification. Here, themain idea of logistic regression is to establish a regression formula for classification boundary line according to the existing data to achieve the purpose of classification. Suppose we have a bunch of data that needs to be sorted by a uniform line (the best line), which is the purpose of logistic regression.

And what does "regression" in "Logistic regression" represent? Mathematics holds that regression is a fitting process, and regression analysis is essentially a function estimation problem, which is to find the causal relationship between the dependent variable and the independent variable. For example, let's say we have some data points, and now we use a straight line to fit these points, so that this line represents the distribution of data points as much as possible, and this fitting process is called regression.

In machine learning tasks, the training of classifiers is the process of finding the best fit curve, so the optimization algorithm will be used next. Before implementing the algorithm, summarize some of the properties of the logistic regression:

    • Advantages: Low computational cost, easy to understand and implement
    • Disadvantages: Easy to fit, classification accuracy may not be high
    • Applicable data types: numeric and nominal data

Directory:

First, logistic regression based on sigmoid function

    1. sigmoid function
    2. Optimal regression coefficient determination based on optimization method
    3. The realization of gradient rising method
    4. Realization of random gradient ascending algorithm

Second example: Predicting the mortality rate of a disease horse from a hernia condition

    1. Preparing data
    2. Test algorithms, using logistic regression for classification

Iii. Summary

===============================================

First, logistic regression based on sigmoid function

1.Sigmoid function

Logistic regression the desired function is to accept all input and then return the category of the prediction, for example, in two cases the function should output class 0 or 1. The sigmoid function is competent for this work, which is like a step function. The formula is as follows:

which

The vector W is called the regression coefficient, which is the best parameter we want to find, x is n the characteristic vector of the dimension and the input data of the classifier.

The following is a diagram of a book, which is a graph of functions at different coordinate scales:

In order to implement the logistic regression classifier, we can take a regression coefficient in each feature, then add all the result values, and then substitute the sum result into the sigmoid function, and then get a 0~1 value between the range. Any data that is greater than is 0.5 divided into 1 classes, and less than 0.5 the data is classified into the 0 class. Therefore, logistic regression can also be considered as a probability estimate.

2. Optimal regression coefficient determination based on optimization method

The above mentioned part of the sigmoid function:

where the vector W is called the regression coefficient, which is the best parameter we want to find, x is n the characteristic vector of the dimension and the input data of the classifier. Here are a few ways to find the best parameters:

    • Gradient Rise method: Based on the idea of finding the maximum value of a function, the best way is to explore it along the gradient direction of the function.
    • Gradient descent method: Similar to the idea of a gradient rise, but in the opposite direction, the best way to find the minimum value of a function is to look in the opposite direction of the gradient direction of the function.

Here, using the gradient rise method, for a function f(x,y) , its gradient representation is as follows:

This gradient means x y moving a certain distance in and out of the direction, which in fact establishes the direction of the next move after the algorithm reaches each point . Where the function f(x,y) must be defined and micro at the point to be computed.

The moving direction is determined, here we define the size of the move as step, with the α expression, using vectors to represent, the gradient rise algorithm of the iteration formula is as follows:

The formula shows that the result of the best parameter w receives the effect of gradient and step, and α this formula will be executed iteratively until a stop condition is met.

3. The realization of gradient rise method

The following uses Python to achieve the gradient rise method, and found the best parameters, where the sample data used has 100 A, each data contains two numeric characteristics: X1 and X2 , these data points can be plotted as a two-dimensional scatter plot. After finding the best parameter w , then write the function to draw the dividing line between different categories of data, and observe the effect of the optimization algorithm.

#-*-Coding:utf-8-*-"" " Created on Sat Sep 22:53:07 2015@author:herbert " "" fromNumPyImport*sys.setrecursionlimit ( the) def loaddata():Datamat = []; Labelmat = [] fr = Open (' TestSet.txt ') forLineinchFr.readlines (): Linesplit = Line.strip (). Split () Datamat.append ([1.0, Float (linesplit[0]), Float (linesplit[1])]) labelmat.append (int (linesplit[2]))returnDatamat, Labelmat def sigmoid(x):    return 1.0/ (1+ exp (-X)) def gradascent(data, label):Datamat = Mat (data) Labelmat = Mat (label). Transpose () m, n = shape (datamat) Alpha =0.001Maxcycles = -W = Ones (n,1)) forKinchRange (maxcycles): p = sigmoid (Datamat * w) error = Labelmat-p W = w + Alpha * datamat.transpose () * ErrorreturnDatamat, Labelmat, wdata, label = LoadData () Datamat, labelmat, w = gradascent (data, label)PrintLabelmat def plotbestsplit(w):    ImportMatplotlib.pyplot asPlt Datamat, Labelmat = LoadData () Dataarr = Array (datamat) n = shape (Dataarr) [0] Xcord1 = []; Ycord1 = [] Xcord2 = []; Ycord2 = [] forIinchRange (N):ifInt (Labelmat[i]) = =1: Xcord1.append (Dataarr[i,1])# array onlyYcord1.append (Dataarr[i,2])Else: Xcord2.append (Dataarr[i,1]) Ycord2.append (Dataarr[i,2]) FIG = plt.figure () ax = Fig.add_subplot (111) Ax.scatter (Xcord1, ycord1, s = -, C =' Red ', marker =' s ') Ax.scatter (Xcord2, ycord2, s = -, C =' Blue ') x = Arange (-3.0,3.0,0.1) y = (-w[0]-w[1] * x)/w[2] Ax.plot (x, y) Plt.xlabel (' x1 '); Plt.ylabel (' x2 '); Plt.show () Plotbestsplit (W.geta ())

As a result, you can see that the classification results using logistic regression are quite good, even though there are three or four sample points that have been wrongly divided:

4. Implementation of random gradient rise algorithm

The gradient-rise algorithm is good for working with 100 data sets around the same time, but if there are billions of of samples and thousands of features, the computational complexity of the method will become terrifying. The improved method is a random gradient ascending algorithm, which updates the regression coefficients with only one sample point at a time. It takes up less computing resources and is an online algorithm that can update the parameters as the data arrives without having to reread the entire data set for batch operations. Processing all data at once is called batch processing. The following code implements the random gradient rise algorithm:

# 随机梯度上升算法def stocGradAscent(data, label):    m,n = shape(data)    0.01    w = ones(n)    forin range(m):        h = sigmoid(sum(data[i]) * w)        error = label[i] - h        w = w + alpha * error *data[i]    return wwNew = stocGradAscent(data, label)plotBestSplit(wNew)

Results:

It can be seen that the random gradient ascending algorithm is divided into many sample points, and its classification effect is not as good as the normal gradient ascending algorithm.

The explanation given in the book is that the normal gradient rise algorithm is the result of iteration over the entire data set 500 , and the random gradient ascent algorithm is only iterative 100 . A reliable method to determine the merits and demerits of the algorithm is to see if it converges, that is, whether the parameters of the solution reached a stable value, whether it will continue to change.

The book gives an example of how the random gradient rise algorithm runs on the entire data set 200 , and the 3 parameters of the w vector are changed during the iterative process, such as:

In the diagram, the second parameter ofW is first stable, while the other two parameters require more iterations to achieve stability. We also found that during the entire iteration, the three parameters ofW had varying degrees of fluctuation. The reason for this is that there are some sample points that have been wrongly divided, and these samples can cause drastic changes in parameters during each iteration. In order to avoid this drastic fluctuation and improve the performance of the algorithm, the following strategies are used to improve the stochastic gradient ascent algorithm:

    • The value of the step alpha is updated at each iteration, making the alpha value decrease, but not reduced to 0 , because the reason for this is to ensure that the new data still has a certain impact after multiple iterations.
    • By randomly taking a sample to update the regression parameters, this method reduces periodic fluctuations, randomly fetching a value from the list each time, and then the value is removed from the list.

Implementation code:

 def stocgradascentadvanced(data, label, Numiter = max):M,n = shape (data) w = ones (n) forIinchRange (numiter): Dataindex = Range (m) forJinchRange (m): Alpha =4/ (1.0+ i + j) +0.01randindex = Int (Random.uniform (0, Len (dataindex))) H = sigmoid (sum (data[randindex] * w)) error = Label[randindex]-H w = W + Alpha * error* Array (Data[randindex])del(Dataindex[randindex])returnWWNEWADV = stocgradascentadvanced (data, label, Numiter = Max) Plotbestsplit (WNEWADV)

After using the optimization strategy, the variation of the parameters of each regression coefficient is greatly improved and the parameters converge much faster:

The improved random gradient ascending algorithm is used to divide the sample points, and the effect is equivalent to the normal gradient ascending algorithm, but the computational amount used is less:

Second example: Predicting the mortality rate of a disease horse from a hernia condition

1. Preparing the data

This example uses logistic regression to predict the survival of a horse with a hernia disease. The data here comes from the UCI machine Learning database of January 11, 2010, which contains 368 a sample and a feature. The data set here is 30% missing from the data. -.-

Before implementing the algorithm, let's look at some methods for handling missing values in the data:

    • Use the mean value of the available features to fill the missing values;
    • Use special values to fill missing values, such as-1;
    • Ignoring samples with missing values;
    • Using the mean value of similar samples to fill the missing values;
    • Using additional machine learning algorithms to predict missing values;
    • For data that is missing from the category label, only that data can be discarded.

Here we use the real 0 number to replace all missing values, which apply exactly to logistic regression, and the data that is missing from the category label must be discarded.

2. Testing algorithms, using logistic regression for classification

Based on this work, the following code multiplies each eigenvector on the test set by the regression coefficient w obtained by the optimization algorithm, sums the result of the product, and finally inputs to the sigmoid function, if the corresponding sigmoid function value is greater than 0.5 , the class of the sample is judged 1 , otherwise it is determined as 0at last, the error of statistic judgment result and actual result, because of the uncertainty of error, the program finally uses the 10 method of sub-classification result averaging, obtains the average classification error rate of the algorithm.

 def classifyvector(x, W):Prob = sigmoid (SUM (x * W))ifProb >0.5:return 1.0    Else:return 0.0 def horsecolictest():Frtrain = open (' HorseColicTraining.txt ') Frtest = open (' HorseColicTest.txt ') Trainingdata = []; Traininglabel = [] forLineinchFrtrain.readlines (): Currline = Line.strip (). Split (' \ t ') Linearr = [] forIinchRange +): Linearr.append (float (currline[i)) trainingdata.append (Linearr) traininglabel.append (float (cur rline[ +]) W = stocgradascentadvanced (Array (trainingdata), Traininglabel, Numiter = -) Errorcount =0.0; Numoftest =0.0     forLineinchFrtest.readlines (): Numoftest + =1.0Currline = Line.strip (). Split (' \ t ') Linearr = [] forIinchRange +): Linearr.append (float (currline[i)))ifInt (classifyvector (Array (Linearr), W))! = Int (currline[ +]): Errorcount + =1Errorrate = float (errorcount)/numoftestPrint "The error rate of this test is:%f"% ErrorratereturnErrorrate def finaltest():Numoftest =Ten; Errorsum =0.0     forIinchRange (numoftest): Errorsum + = Horsecolictest ()Print "After%d iterations the average error rate is:%f"% (Numoftest, errorsum/float (numoftest)) Finaltest ()

The classification error rate is 39.4% , at first glance, quite high. But if you know that the data set you are using is 30% missing, you don't think so ...-.-

Iii. Summary

The purpose of logistic regression is to find the best fitting parameters of a nonlinear function sigmoid, and the solution process can be accomplished by the optimization algorithm. In the optimization algorithm, the gradient ascending algorithm is the most common one, and the gradient ascending algorithm can be simplified as the random gradient ascending algorithm.

The random gradient ascending algorithm and the gradient ascending algorithm have the same effect, but occupy less computing resources. The stochastic gradient rise algorithm is an on-line algorithm that can update the parameters when the data arrives, without having to reread the entire data set for batch operation.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Learning in the field of machine learning notes: Logistic regression & predicting mortality of hernia disease syndrome

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.