Machine Learning-Logistic Regression

Last Update:2014-07-02 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

See machine learning practices

The main idea of using Logistic regression for classification:

Establish a regression formula for the classification boundary line based on the existing data for classification.

The sigmoid function used for classification:

Sigmoid Function diagram:

Functions of sigmoid:

Multiply all features by a regression coefficient, add all result values, and place the sum in the sigmoid function. Then, a 0 ~ A value between 1. Any data greater than 0.5 is divided into 1 category, and less than 0.5 is divided into 0 categories.

In summary, the input of sigmoid can be recorded as Z:

So VectorWThis is the coefficient we need to find through the optimization method.

W vector solution:

1) gradient rise method (thought: finding the maximum value of a function in the direction of the function gradient)

Pseudo code of the gradient rise method:

Update the implementation code of W coefficient details:

Note that the feature vector dimension of datamatrix is one-dimensional more than that of the actual feature vector. The data used by the author is two-dimensional [x1, x2], while the program adds one-dimensional [X0 = 1, x1, x2]. the strange thing is that x0 is added to the first position, not the last position. In addition, the author of the formula at the Red Line in the Chinese painting did not give its origin. After searching online, he found a blog post and wrote it well. Here is a brief overview of the post:

The specific process is as follows: (reference: http://blog.csdn.net/yangliuy/article/details/18504921? Reload)

Parameter probability equation:

X indicates the training feature, y indicates the class corresponding to X, and θ indicates the parameter to be estimated.

In the above formula, y takes only 0 or 1, which can be expressed:

Likelihood function: (this is the objective function of Logistic regression, which is not specified in the original book. Therefore, if you have learned machine learning in the data area of logistic data, you cannot understand the practice in this book)

For the number-likelihood function:

Therefore, the maximum likelihood estimation is as follows:

To obtain the recursive formula of the gradient rise method:

In the above figure, the formula at the red line is generated.

Upload your own code (unoptimized logistic algorithm) Here. The data source in the code is still the data provided by machine learning practice:

#-*-Coding: cp936-*-import numpy as npimport matplotlib. pyplot as pltclass log_reg (): def _ init _ (Self): Self. _ closed = false def loaddata (self, datafile='testset.txt '): f_file = open (datafile) lines = f_file.readlines () line_data = lines [0]. strip (). split () self. num_feature = Len (line_data)-1 self. xdata = NP. zeros (LEN (lines), self. num_feature + 1) self. label = NP. zeros (LEN (lines), 1) self. num _ Label = Len (lines) line_cnt = 0 for iline in lines: line_data = iline. strip (). split () For I in range (self. num_feature): Self. xdata [line_cnt] [I] = float (line_data [I]) self. xdata [line_cnt] [self. num_feature] = 1 self. label [line_cnt] = float (line_data [-1]) line_cnt + = 1 def _ sigmoid (self, Z): Return 1.0/(1 + NP. exp (-z) def gradascendclass (Self): maxiter = 500 self. omiga = NP. ones (1, self. num_featur E + 1) xdatamat = NP. matrix (self. (xdata) alpha = 0.01 self. omiga_record = [] For I in range (maxiter): H = self. _ sigmoid (self. omiga * xdatamat. transpose () # matrix multiplication error = self. label-H. transpose () self. omiga = self. omiga + Alpha * (xdatamat. transpose () * error ). transpose () self. omiga_record.append (self. omiga) If NP. sum (NP. ABS (error) <self. num_label * 0.05: Print "error very low", I break def stochasticg Radascend (Self): pass # maxiter = 150 # self. omiga = NP. ones (1, self. num_feature + 1) # For def plotresult (Self): Self. _ close () If self. num_feature! = 2: Print "only plot data with 2 features! "Return label0x = [] label0y = [] label1x = [] label1y = [] For I in range (self. num_label): If int (self. label [I]) = 1: label1x. append (self. xdata [I] [0]) label1y. append (self. xdata [I] [1]) else: label0x. append (self. xdata [I] [0]) label0y. append (self. xdata [I] [1]) fig = PLT. figure () AX = fig. add_subplot (111) ax. scatter (label0x, label0y, c = 'B', marker = 'O') ax. scatter (label1x, label1y, c = 'R', marker = 's') Minx = min (label0x), min (label1x )) maxx = max (label0x), max (label1x) WX = NP. arange (Minx, Maxx, 0.1) WY = (-self. omiga [0, 2]-self. omiga [0, 0] * wx)/self. omiga [0, 1] ax. plot (wx, WY) def plotiteration (Self): Self. _ close () itertimes = Len (self. omiga_record) W0 = [I [0] [0, 0] For I in self. omiga_record] W1 = [I [0] [0, 1] For I in self. omiga_record] W2 = [I [0] [0, 2] For I in self. omiga_record] fig = PLT. figure () ax1 = fig. add_subplot (3,1, 1) ax1.plot (range (itertimes), w0, c = 'B') #, marker = '*') PLT. xlabel ('w0') ax2 = fig. add_subplot (3,1, 2) ax2.plot (range (itertimes), W1, c = 'R') #, marker = 's') PLT. xlabel ('w1 ') ax3 = fig. add_subplot (3,1, 3) ax3.plot (range (itertimes), W2, c = 'G') #, marker = 'O') PLT. xlabel ('w2 ') def show (Self): PLT. show () def _ close (Self): pass if _ name _ = '_ main _': testclass = log_reg () testclass. loaddata () testclass. gradascendclass () testclass. plotresult () testclass. plotiteration () testclass. show ()

Display result:

Classification Result

Classification parameter convergence result

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Machine Learning-Logistic Regression

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Machine Learning-Logistic Regression

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support