Logistic Regression to do Binary classification

Source: Internet
Author: User
Tags theano


using Python's Theano to write a logistic regression for two classification learning, the datasets used can be downloaded here .

We know that the logistic regression is a nonlinear function based on a multivariate linear function, and the commonly used nonlinear function is the sigmoid function. Plus the output after sigmoid we think the probability of the corresponding classification is 1, so the parameters that need to be studied are linear weighted coefficients and intercept (bias).

H (x) = WX + b

G (x) = 1/(1 + exp (-h (x))) = 1/(1 + exp (-wx-b))

Then the probability of a corresponding classification of 1 can be expressed as:

P (Y=1 | x; w, b) = g (x)

Then the probability for a known data is expressed as:

P (y | x; w, b) = g (x) ^y (1-g (x)) ^ (1-y)

So the objective function of the last training is to maximize the likelihood function of the known data, and to multiply the probability of the above is to fit the likelihood function of training data. However, due to the problem of multiplication in terms of computation and precision, the likelihood function is usually log, if it is a single instance of the logarithmic result is:

Log (P) = Ylog (g (x)) + (1-y) log (1-g (x))

This looks a bit like cross-entropy, and adding this to the training data is the last log-like. Of course, the front plus a symbol is the negative log likelihood, the parametric solution is to minimize the negative log likelihood when the corresponding parameter situation. The commonly used method is gradient descent.

The following is affixed with a Python Theano implementation of the two classification of the logistic Regression, the final output of the training data on the error rate, interested students can see. The training data used in the code can be downloaded here.

#-*-Coding:utf-8-*-"" "Created on Sun Nov + 21:37:43 2014@author:brighthushexample for Logistic Regression" "" Import t Imeimport numpyimport theanoimport theano.tensor as Trng = Numpy.randomclass Logisticregression (object): Def __init__ (s Elf, input, n_in): SELF.W = theano.shared (Value=rng.randn (n_in), Name= ' W ', Borr ow=true) self.b = theano.shared (value=.10, name= ' b ') self.p_given_x = 1/(1+t.exp (-t.dot (in Put, SELF.W)-self.b) self.y_given_x = self.p_given_x > 0.5 self.params = [SELF.W, self.b] D EF Negative_log_likelihood (self, y): ll = y * T.log (self.p_given_x)-(1-y) * T.log (1-self.p_given_x) Cos t = Ll.mean () + 0.01 * (SELF.W * 2). SUM () return cost def errors (self, y): Return T.mean (T.NEQ (self.y_gi ven_x, y)) def generate_data (): rng = numpy.random N = feats = 5 D = (RNG.RANDN (N, feats), Rng.randint (siz    E=n, Low=0, high=2))x = d[0] y = d[1] x, y = read_data () x_shared = theano.shared (Numpy.asarray (x,                                               DTYPE=THEANO.CONFIG.FLOATX), Borrow=true) y_shared = theano.shared (Numpy.asarray (Y, Dtype=theano.con FIG.FLOATX), borrow=true) return x_shared, T.cast (y_shared, ' int32 ') def sgd_o Ptimization (learning_rate=0.13, n_epochs=1000, batch_size=100): train_x, train_y = Generate_data () N_batches = Train        _x.get_value (borrow=true). shape[0]/Batch_size index = t.lscalar () x = T.matrix (' x ') y = t.ivector (' y ') LR = logisticregression (x, Train_x.get_value (). shape[1]) cost = Lr.negative_log_likelihood (y) print ' Comp ile function Test_model ... ' Test_model = Theano.function (Inputs=[index], outputs=lr.er                Rors (y),                  givens={x:train_x[index*batch_size: (index+1) *batch_size],        Y:train_y[index*batch_size: (index+1) *batch_size]})                g_w = T.grad (Cost=cost, WRT=LR.W) G_b = T.grad (Cost=cost, wrt=lr.b) updates = [(LR.W, Lr.w-learning_rate*g_w),        (Lr.b, Lr.b-learning_rate*g_b)]                                   print ' Complie function Train_model ... ' Train_model = Theano.function (Inputs=[index],                                      Outputs=cost, Updates=updates, givens={ X:train_x[index*batch_size: (index+1) *batch_size], Y:tra In_y[index*batch_size: (index+1) *batch_size]}) Best_train_error = NumPy. INF start_time = Time.clock () for the epoch in Xrange (N_EPOCHS): for Minibatch_indexIn Xrange (n_batches): Batch_cost = Train_model (minibatch_index) train_errors = [Test_model (i            ) for I in Xrange (n_batches)] Train_error = Numpy.mean (train_errors) if Best_train_error > Train_error:             Best_train_error = train_error print ' epoch%d, Best_train_error%lf, Train_error%lf ' % (epoch, best_train_error, train_error) #print ' iterator%d%lf '% (epoch*n_batches + minibatch_index+1, b        Atch_cost) End_time = Time.clock () print ' cost%d '% (end_time-start_time) def read_data (): print ' Load data ... ' data = Numpy.loadtxt ('. \\titanic.dat ', delimiter= ', ', skiprows=8) x = [] y = [] for i in Xrange (data.shape[            0]): X.append (Data[i,: data.shape[1]-1]) if data[i, -1]==-1.0:y.append (0) Else: Y.append (1) x = Numpy.array (x) y = Numpy.array (y) print '%d examples,%d columns every row '% (data.shape[      0], data.shape[1])  #normalize the Fatures feature_min = x.min (0) Feature_max = X.max (0) x = X-numpy.array (feature_min) x = X/numpy.array (feature_max-feature_min) print x.min (0), X.max (0) return Numpy.array (x), Numpy.array (y) I     F __name__ = = ' __main__ ': sgd_optimization ()




Logistic Regression to do Binary classification

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.