Welcome to the Csdn-markdown Editor

Source: Internet
Author: User
Tags theano

Theano Logistic regression explanation

The logic model is a linear classifier based on probability. Its parameters are W and b. The classification is achieved by mapping the input vectors to a set of hyper-planes, each of which corresponds to a classification. The distance from the super-plane to the input vector reflects the probability that the input belongs to this classification. Mathematically, an input formula that belongs to a category can be expressed as the following formula:

This formula means that when an input x is known, the Softmax obtained by the conjecture parameter (W,B) is the formula for calculating the probability that it belongs to I.

I represents the first few samples, J represents the 1~k value of Y for the value of J.

A Stanford University website carefully explains the process of the demolition:
http://deeplearning.stanford.edu/wiki/index.php/Softmax%E5%9B%9E%E5%BD%92

The following argmax_i is easier to understand, is to find the 1~k of the probability, the largest one.

The following code can achieve the calculation of the above two formulas:

#initialize with 0 The weights W as a matrix of shape (n_in, n_out)Self. W= Theano. Shared(Value=numpy. Zeros(N_in, n_out), Dtype=theano. config. Floatx), Name=' W ', borrow=true)# Initialize the biases B as a vector of n_out 0sSelf. b= Theano. Shared(Value=numpy. Zeros(N_out,), Dtype=theano. config. Floatx), Name=' B ', borrow=true)# Symbolic expression for computing the matrix of Class-membership        # probabilities        # Where:        # W is a matrix where column-k represent the separation hyperplane for        # class-k        # x is a matrix where row-j represents input training Sample-j        # b is a vector where element-k represent the free parameter of        # hyperplane-kSelf. P_y_given_x = T. Nnet. Softmax(T. Dot(Input, Self. W) + Self. b)#上述输出是一个概率向量        # Symbolic description of how to compute prediction as class whose        # probability is maximalSelf. Y_pred = T. Argmax(Self. P_y_given_x, axis=1)#这个x轴指的是竖着选, there is a horizontal election, is axis=2. 

Since the parameters of the model must be persistent in the training process, we have created a shared variable, w,b.
This practice also takes them as a form of variable, but also initializes their values. The dot and Softmax operations are used to calculate probabilities, and p_y_given_x is a vector variable.

If you want a specific forecast value, we also need a argmax operation, will tell you p_y_given_x the largest one is the number of the first (index)

Of course, the model we are defining now is of no practical use, as the parameters are still in the initial state, and the following chapters will explain how to learn an optimized model.

Define a cost function

The process of learning is to minimize a cost function. In the multi-classification logistic regression model, the negative likelihood function is commonly used as the cost function. This is equivalent to a maximum likelihood estimate of DataSet D under the model determined by \theta. In terms of plain English, the parameter \theta is adjusted for the specific data set D, so that the final prediction accuracy is the highest and the cost is minimal. Let's start by defining the cost function:


The meaning of the formula is that, on a data set D, traversing computes the pair of Softmax, and finally summing, is the cost function. That is, the maximum probability value. Take a look at the California Polytechnic's video. Why the logistic regression is not to use the negative logarithm likelihood function, it is not clear, the article is to say that commonly used is this. Not to study.

All books are devoted to minimizing the topic, and gradient descent is the simplest way to minimize any nonlinear function at this time. This tutorial will use the method of batch random gradient descent.

The following code defines the calculation formula for the loss value of each small batch:

# y.shape[0] is(symbolically) the number of rowsinchY, i.e., # Number of examples (Pagerit n)inchThe Minibatch # T.arange (y.shape[0]) isA symbolic vector which would contain # [0,1,2,... N-1T.Log(self.p_y_given_x) isA matrix of #Log-probabilities (PagerIt LP) withOne row per example and# one column perclassLp[t.arange (y.shape[0]), Y] isA vector # v containing [lp[0, y[0]], lp[1, y[1]], lp[2, y[2]],..., # lp[n-1, y[n-1]]] andT.mean (Lp[t.arange (y.shape[0]), Y]) is# The mean (across Minibatch examples) of the elementsinchV, # i.e, the meanLog-likelihood across the Minibatch. Return-t.mean (T.Log(self.p_y_given_x) [T.arange (y.shape[0]), Y])

The above code is a bit out of the equation and needs to be studied in detail. In the original sense, is to put all the input of the set to calculate once, to obtain the logarithm and can, value why this expression: [T.arange (Y.shape[0]
Look at the following to determine what.

Create a logistic regression class
 class logisticregression(object):    "" " multi-class Logistic Regression class The logistic Regression is fully described by a weight matrix:math: ' W ' and bias Vector:math: ' B '. Classification is do by projecting data points onto a set of hyperplanes, the distance to which are used to Determi    NE a class membership probability. """     def __init__(self, input, n_in, n_out):        "" "Initialize the parameters of the logistic regression:type input:theano.tensor.TensorType:p Aram in Put:symbolic variable that describes the input of the architecture (one minibatch): type n_in : int:p Aram N_in:number of input units, the dimension of the space in which the datapoints Li                      E:type N_out:int:p Aram N_out:number of output units, the dimension of the space in which the labels lie "" "        # start-snippet-1        # Initialize with 0 The weights W as a matrix of shape (n_in, n_out)Self.            W = theano.shared (Value=numpy.zeros (n_in, n_out), Dtype=theano.config.floatx ), Name=' W ', borrow=True)# Initialize the biases B as a vector of n_out 0sself.b = theano.shared (Value=numpy.zeros (N_out,), Dtype=theano.config.floatx ), Name=' B ', borrow=True)# Symbolic expression for computing the matrix of Class-membership        # probabilities        # Where:        # W is a matrix where column-k represent the separation hyperplane for        # class-k        # x is a matrix where row-j represents input training Sample-j        # b is a vector where element-k represent the free parameter of        # hyperplane-kself.p_y_given_x = T.nnet.softmax (T.dot (Input, self. W) + self.b)# Symbolic description of how to compute prediction as class whose        # probability is maximalself.y_pred = T.argmax (self.p_y_given_x, axis=1)# end-snippet-1        # Parameters of the modelSelf.params = [self. W, self.b]# Keep track of model inputSelf.input = input def negative_log_likelihood(self, y):        "" "Return the mean of the negative log-likelihood of the prediction of this model under a given target Distribu        tion. .. Math:: \frac{1}{|\mathcal{d}|} \mathcal{l} (\theta=\{w,b\}, \mathcal{d}) = \frac{1}{|\mathcal{d}|} \                sum_{i=0}^{|\mathcal{d}|}        \log (P (y=y^{(i)}|x^{(i)}, w,b)) \ \ell (\theta=\{w,b\}, \mathcal{d}): Type Y:theano.tensor.tensortype  :p Aram Y:corresponds to a vector, gives for each example the correct label Note:we use The mean instead of the sum so, the learning rate was less dependent on the batch size "" "        # start-snippet-2        # Y.shape[0] is (symbolically) the number of rows in Y, i.e.,        # Number of examples (call it n) in the Minibatch        # T.arange (Y.shape[0]) is a symbolic vector which would contain        # [0,1,2,... n-1] T.log (self.p_y_given_x) is a matrix of        # log-probabilities (call it LP) with one row per example and        # One column per class Lp[t.arange (Y.shape[0]), y] is a vector        # v containing [lp[0,y[0]], lp[1,y[1]], lp[2,y[2], ...,        # Lp[n-1,y[n-1]] and T.mean (Lp[t.arange (y.shape[0]), Y]) is        # The mean (across Minibatch examples) of the elements in V,        # i.e, the mean log-likelihood across the minibatch.        return-t.mean (T.log (self.p_y_given_x) [T.arange (y.shape[0]), Y])# end-snippet-2     def errors(self, y):        "" "Return a float representing the number of errors in the Minibatch over the total number of examples of the M Inibatch; Zero One loss over the size of the Minibatch:type y:theano.tensor.tensortype:p Aram Y:corresponds To a vector this gives for each example the correct label "" "        # Check if Y has same dimension of y_pred        ifY.ndim! = Self.y_pred.ndim:RaiseTypeError (' Y should have the same shape as self.y_pred ',                (' y ', Y.type,' y_pred ', Self.y_pred.type))# Check if y is of the correct datatype        ifY.dtype.startswith (' int '):# The T.NEQ operator returns a vector of 0s and 1s, where 1            # Represents a mistake in prediction            returnT.mean (T.neq (self.y_pred, y))Else:RaiseNotimplementederror ()

Instantiate it with the following code

# generate symbolic variables for input (x and Y represent a< /span> # minibatch)  x  = T.matrix  ( ' x ' ) # data, Presented as rasterized images  y  = T ( ' y ' ) # labels, presented as 1D vector of [int] labels  # construct the logistic regression class  # each MNIST image has a size 28*28  classifier = logisticregression (Input=x , N_in=28  * 28 , n_out=< Span class= "Hljs-number" >10 ) 

Finally we define a cost function and then go to the minimum value:

# the cost we minimize during training is the negative log likelihood of    # the model in symbolic format    cost = classifier.negative_log_likelihood(y)
Now let's train this model.

In order to realize the recognition algorithm, you may need to manually take the derivative according to the parameter, this is very challenging for the complex model, in Theano, the work is very simple, it automatically performs some mathematical transformations, to improve the stability of the digital.

The following code represents a simple formula for calculating gradients:

    g_W = T.grad(cost=cost, wrt=classifier.W)    g_b = T.grad(cost=cost, wrt=classifier.b)

The following method Train_model, performs a gradient descent of a step,

# Specify how ToUpdateThe parameters ofThe model asA list of# (variable,Updateexpression) pairs. Updates = [(classifier. W, classifier. W-learning_rate * g_w), (classifier.b, Classifier.b-learning_rate * g_b)] # compiling a Theano functi On' Train_model 'That's returns the cost, butinch# the same TimeUpdates the parameter ofThe model based onThe rules # Definedinch ' Updates 'Train_model = Theano.function (Inputs=[index], outputs=cost, Updates=updates, givens={ X:train_set_x[index * Batch_size: (index +1) * Batch_size], Y:train_set_y[index * batch_size: (index +1) * Batch_size]})

The Update method does two things, it comes in pairs, the first one is Update W, through W=w-learning_rate*g_w,
The given here actually calculates how the X and y of this mini-batch are calculated, and nothing else is mysterious.

    • The input of the Train_model is the index of the mini-batch, and the batch size and together, you can determine the input x, and the label Y.
    • The return value is a cost function, which is a cost function for this mini-batch.
    • Each time the function is called, this function first calculates x, Y, and then calculates the cost function, and finally updates the update list

Each time unit T, Train_model is called, it will calculate and return the cost of the mini-batch, and also perform a step in the training data, the whole learning algorithm is done in a loop.

Test model

Welcome to the Csdn-markdown Editor

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.