Multilayer Perceptron Learning

Last Update:2015-10-16 Source: Internet

Author: User

Tags theano

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Introduction to Multilayer Perceptron

A multilayer perceptron (MLP) can be seen as a logistic regression, but its input is preceded by a non-linear transformation, so that the data is mapped to a linearly divided space, which we call the hidden layer. Usually a single layer of hidden layer can be used as a perceptron, the structure of which is as follows:

The input layer here first obtains the total output value through the weight matrix and the bias, and through the Tanh function makes a nonlinear transformation can obtain the hidden layer, and then from the hidden layer to the output layer can use the previous logistic regression operation.

Here we also use the SGD algorithm to update parameters, a total of four parameters, respectively, Input-hidden weight, offset and hidden-output weight, offset.

2.Python Code2.1 Program Flow

First we need to construct the hidden layer class, which is constructed by passing in the input layer, determining the dimensions of the inputs and hidden layer dimensions, and then initializing the weights matrix between the input Layer-hidden layer using a uniform distribution. Initializes the bias of the hidden layer to a 0 vector. And finally get the output of a nonlinear activation function.

Next we define an MLP class, which is the input layer, the dimension of the input layer, the hidden layer dimension, the output layer dimension, then instantiates a Hiddenlayer object, then returns the output of the object in a logistic regression (refer to the code of the previous logistic regression blog post, the code in this article needs to import it) , and then define the sum of W of the L1 for Hiddenlayer and plus the instance of the logistic regression (simply say that the sum of the two W respective sums is added to get L1), while defining L2 as The sum of the squares of the Hiddenlayer w squared plus the sum of the squares of the instances of the logistic regression (simply, the sum of two w squares and the addition of the L2), the negative logarithm likelihood of the logistic regression, the error function is passed to the MLP, and the four parameters are integrated.

Next is the training phase, where we still test the mnist handwriting recognition, where the loss function is defined as a negative logarithmic likelihood +l1+l2, and then a symbolic expression of two functions Test_model and Validate_model is defined, and the gradient of the parameter and the symbolic expression of the update are obtained. , and finally integrates into the Train_model function.

The next step is to start the training, train a total of 1000epoch, then divide the data into multiple minibatch, and each time the Minibatch_index data is updated with the training parameters.

2.2 Code

"" This tutorial introduces the multilayer perceptron using Theano. A multilayer perceptron is a logistic regressor whereinstead of feeding the input to the logistic regression you insert Ai Ntermediate layer, called the hidden layer, which has a nonlinearactivation function (usually Tanh or sigmoid). One can use many Suchhidden layers making the architecture deep. The tutorial would also tacklethe problem of MNIST digit classification ... math:: f (x) = G (b^{(2)} + w^{(2)} (S (b^{(1) } + w^{(1)} x)), References:-Textbooks: "Pattern Recognition and machine learning"-Christopher M. B Ishop, Section 5 "" "__docformat__ = ' restructedtext en ' import osimport sysimport timeitimport numpyimport Theanoimport the Ano.tensor as Tfrom logistic_sgd import logisticregression, load_data# start-snippet-1class HiddenLayer (object): Def __  Init__ (self, rng, input, n_in, N_out, W=none, B=none, Activation=t.tanh): "" "typical hidden Layer of a mlp:units aRe fully-connected and has sigmoidal activation function.        Weight matrix W is of shape (n_in,n_out) and the bias vectors b is of shape (n_out,). Note:the nonlinearity used Here's Tanh Hidden unit activation is given By:tanh (dot (input,w) + B): Type R Ng:numpy.random.RandomState:p Aram rng:a random number generator used to initialize Weights:type input:t Heano.tensor.dmatrix:p Aram Input:a symbolic tensor of shape (N_examples, n_in): type N_in:int:p ar Am n_in:dimensionality of Input:type n_out:int:p Aram N_out:number of Hidden Units:type Activati On:theano.        Op or function:p Aram Activation:non linearity to being applied in the hidden layer "" "Self.input = input # end-snippet-1 # ' W ' is initialized with ' w_values ' which is uniformely Sampl Ed # from sqrt ( -6./(N_in+n_hidden)) and sqrt (6./(N_in+n_hidden)) # for Tanh ACtivation function # The output of uniform if converted using Asarray to Dtype # Theano.config.floatX so The code is runable on GPU # Note:optimal initialization of weights are dependent on the # Activati        On function used (among other things). # For example, results presented in [XAVIER10] suggest so you # should use 4 times larger initial W  Eights for sigmoid # compared to Tanh # we had no info for other function, so we use the same        As # Tanh. If W is none:w_values = Numpy.asarray (Rng.uniform (LOW=-NUMPY.SQRT (6./(N_                In + n_out), high=numpy.sqrt (6./(n_in + n_out)), size= (n_in, N_out)                ), dtype=theano.config.floatx) if activation = = Theano.tensor.nnet.sigmoid: W_values *= 4 W = theano.shared (Value=w_values, Name= ' W ', borrow=true) if B is None:b_values = Numpy.zeros ((n_out,), DTYPE=THEANO.CONFIG.FLOATX) b = theano.shared (value=b_values, name= ' B ', borrow=true) self. w = w self.b = b lin_output = T.dot (input, self.        W) + self.b Self.output = (lin_output if activation is None else activation (lin_output) ) # Parameters of the Model self.params = [self. W, self.b]# start-snippet-2class MLP (object): "" "Multi-layer Perceptron Class A multilayer Perceptron is a Feedforwa    RD Artificial Neural network model that have one layer or more of hidden units and nonlinear activations.  Intermediate layers usually has as activation function tanh or the sigmoid function (defined here by a ' hiddenlayer '    Class) While the top layer was a SOFTMAX layer (defined here by a ' logisticregression ' Class). "" "Def __init__ (self, rng, input, n_in, N_hidden, n_out):" "" Initialize the ParamEters for the multilayer perceptron:type rng:numpy.random.RandomState:p Aram rng:a random number Generato R used to initialize Weights:type input:theano.tensor.TensorType:p Aram input:symbolic variable that desc Ribes the input of the architecture (one minibatch): Type N_in:int:p Aram N_in:number of input unit  s, the dimension of the space in which the datapoints lie:type n_hidden:int:p Aram N_hidden:number        of hidden Units:type n_out:int:p Aram N_out:number of output units, the dimension of the space in Which the labels lie "" "# Since We is dealing with a one hidden layer MLP, this would translate # I Nto a hiddenlayer with a Tanh activation function connected to the # logisticregression layer; The activation function can be replaced by # sigmoid or any other nonlinear function Self.hiddenlayer = Hidd Enlayer (RNG=RNG, InpuT=input, n_in=n_in, N_out=n_hidden, Activation=t.tanh) # The logistic Regr Ession layer gets as input the hidden units # of the hidden layer Self.logregressionlayer = Logisticregressi On (Input=self.hiddenlayer.output, N_in=n_hidden, n_out=n_out) # End-snipp Et-2 start-snippet-3 # L1 Norm; One regularization option is to enforce L1 norm to # being small self.  L1 = (ABS (SELF.HIDDENLAYER.W). SUM () + ABS (SELF.LOGREGRESSIONLAYER.W). sum ()) # Square of L2 Norm; One regularization option is to enforce # Square of the L2 norm to being small self.        L2_SQR = ((SELF.HIDDENLAYER.W * 2). SUM () + (SELF.LOGREGRESSIONLAYER.W * * 2). SUM ())  # Negative log likelihood of the MLP is given by the negative # log likelihood of the output of the model, computed In the # logistic regression layer        Self.negative_log_likelihood = (self.logRegressionLayer.negative_log_likelihood) # same Holds for the function computing the number of errors self.errors = self.logRegressionLayer.errors # The PA Rameters of the model is the parameters of the "the" It is # made out of Self.params = Self.hiddenlayer . Params + self.logRegressionLayer.params # end-snippet-3 # keep track of model input self.input = INP Utdef TEST_MLP (learning_rate=0.01, l1_reg=0.00, l2_reg=0.0001, n_epochs=1000, dataset= ' mnist.pkl.gz ', batch_si Ze=20, n_hidden=500): "" "demonstrate stochastic gradient descent optimization for a multilayer perceptron Thi    S is demonstrated on MNIST. : Type learning_rate:float:p aram learning_rate:learning rate Used (factor for the stochastic Gradient:type l1_ Reg:float:p Aram L1_reg:l1-norm ' s weight when added to the cost (see regularization): Type L2_reg:float    :p Aram L2_reg:l2-norm ' s weight when added to the cost (see regularization): Type N_epochs:int:p Aram N_epoc Hs:maximal number of epochs to run the Optimizer:type dataset:string:p Aram dataset:the Path of the MNIST datase T file from http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz "" "Datasets = Load_data (d Ataset) train_set_x, train_set_y = datasets[0] valid_set_x, valid_set_y = datasets[1] test_set_x, test_set_y = da TASETS[2] # Compute number of minibatches for training, validation and testing n_train_batches = Train_set_x.get_val UE (BORROW=TRUE). shape[0]/Batch_size n_valid_batches = Valid_set_x.get_value (borrow=true). shape[0]/batch_size n_t    Est_batches = Test_set_x.get_value (borrow=true). shape[0]/batch_size ###################### # BUILD ACTUAL MODEL # ###################### print ' ... building the Model ' # Allocate symbolic variables for the data index = T.LSC Alar () # Index to a [MINi]batch x = T.matrix (' x ') # The data is presented as rasterized images y = t.ivector (' y ') # The labels is Presen Ted as 1D vector of # [int] Labels rng = numpy.random.RandomState (1234) # construct the MLP C    Lass classifier = MLP (rng=rng, Input=x, n_in=28 *, N_hidden=n_hidden, n_out=10 # start-snippet-4 # The cost we minimize during training is the negative log likelihood of # The model plus T He regularization terms (L1 and L2); Cost are expressed # here symbolically cost = (Classifier.negative_log_likelihood (y) + L1_reg * Classi Fier. L1 + L2_reg * classifier.  L2_SQR) # end-snippet-4 # compiling a Theano function that computes the mistakes that is made # by the model On a Minibatch Test_model = Theano.function (Inputs=[index], outputs=classifier.errors (y), Givens ={X:test_set_x[index * batch_size: (index + 1) * Batch_size], Y:test_set_y[index * batch_size: (index + 1) * Batch_size]}) Validate_model = Theano.function ( Inputs=[index], outputs=classifier.errors (y), givens={x:valid_set_x[index * batch_size: (index + 1) * Batch_size], Y:valid_set_y[index * batch_size: (index + 1) * Batch_size]}) # Start  -SNIPPET-5 # Compute the gradient of cost with respect to theta (sotred in params) # The resulting gradients would be  stored in a list gparams gparams = [T.grad (cost, param) for Param ' classifier.params] # Specify how to update the Parameters of the model as a list of # (variable, update expression) pairs # Given, lists of the same length, a = [A1, a2, A3, A4] and # B = [B1, B2, B3, B4], Zip generates a list C of same size, where each # element is a pair F Ormed from the lists: # C = [(A1, B1), (A2, B2), (A3, B3), (A4, b4)] updates = [(Param, Param-lear     Ning_rate * Gparam)   For Param, gparam in zip (Classifier.params, Gparams)] # Compiling a Theano function ' Train_model ' that returns t He cost, but # in the same time updates the parameter of the model based on the rules # defined in ' Updates ' Trai            N_model = Theano.function (Inputs=[index], outputs=cost, Updates=updates, givens={ X:train_set_x[index * Batch_size: (index + 1) * Batch_size], Y:train_set_y[index * batch_size: (index + 1) * Batch_size]}) # end-snippet-5 ############### # TRAIN MODEL # ############### print ' ... Traini  Ng ' # early-stopping parameters patience = 10000 # look as this many examples regardless patience_increase = 2 # Wait this much longer if a new best is # found improvement_threshold = 0.995 # a relativ E improvement of this much is # considered significant validation_frequency = min (n_t Rain_batches, Patience/ 2) # go through this many # Minibatche before checking The network # on the validation set;    In this case we # check every epoch Best_validation_loss = Numpy.inf Best_iter = 0    Test_score = 0. Start_time = Timeit.default_timer () Epoch = 0 done_looping = False while (Epoch < n_epochs) and (not Done_loop ing): Epoch = epoch + 1 for Minibatch_index in Xrange (n_train_batches): Minibatch_avg_cost = Trai            N_model (minibatch_index) # iteration Number iter = (epoch-1) * n_train_batches + Minibatch_index                if (iter + 1)% Validation_frequency = = 0: # Compute Zero-one loss on validation set                Validation_losses = [Validate_model (i) for I in Xrange (n_valid_batches)] This_validation_loss = Numpy.mean (validation_lossES) print (' Epoch%i, Minibatch%i/%i, validation error%f percent '% (                        Epoch, Minibatch_index + 1, n_train_batches,                    This_validation_loss * 100. ) # If we got the best validation score until now if This_validation_loss & Lt                        Best_validation_loss: #improve Patience If loss improvement is good enough if (                    This_validation_loss < Best_validation_loss * Improvement_threshold ): Patience = max (patience, ITER * patience_increase) Best_validation_                    Loss = This_validation_loss Best_iter = iter # test It on the test set Test_losses = [Test_model (i) for I in XranGE (n_test_batches)] Test_score = Numpy.mean (test_losses) print (' Epoch%i, mini Batch%i/%i, test error of ' best model%f percent ')% (epoch, minibatch_in            Dex + 1, n_train_batches, Test_score * 100.)) If patience <= iter:done_looping = True Break end_time = Timeit.default_timer () pr Int (' Optimization complete. Best validation score of%f percent "obtained at iteration%i, with test performance%f% ')% (best_valid    Ation_loss *, Best_iter + 1, Test_score * 100.)                          Print >> Sys.stderr, (' The Code for file ' + os.path.split (__file__) [1] + ' ran for%.2fm '% ((end_time-start_time)/60.)) if __name__ = = ' __main__ ': TEST_MLP ()

Multilayer Perceptron Learning

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More