DeepLearning tutorial (3) MLP multi-layer awareness machine principle + code explanation, deeplearningmlp

Source: Internet
Author: User
Tags theano

DeepLearning tutorial (3) MLP multi-layer awareness machine principle + code explanation, deeplearningmlp

DeepLearning tutorial (3) MLP multi-layer sensor principle + code explanation

@ Author: wepon

@ Blog: http://blog.csdn.net/u012162613/article/details/43221829


This article introduces the multi-layer sensor algorithm, especially the code implementation. Based on python theano, the Code comes from Multilayer Perceptron. If you want to learn more about the multi-layer sensor algorithm, refer to the UFLDL tutorial, or refer to the algorithm introduction in the first part of this article.

Code with detailed comments: Put it on my github address and you can download it.


I. Introduction to MLP principles

A multi-layer sensor (MLP) (Multilayer Perceptron) is also called an Artificial Neural Network (ANN, Artificial Neural Network). In addition to the input/output layer, it can have multiple hidden layers in the middle, the simplest MLP contains only one hidden layer, that is, the three-layer structure, such:



As you can see, the multi-layer sensor layer and layer are fully connected (full connection means that any neuron on the previous layer is connected to all neurons on the next layer ). The bottom layer of a multi-layer sensor is the input layer, the middle is the hidden layer, and the last is the output layer.


There is nothing to say in the input layer. What is your input? For example, if the input is an n-dimensional vector, there are n neurons.

How do neurons in the hidden layer come from? First, it is fully connected to the input layer. If the input layer is represented by vector X, the output of the hidden layer is

F (W1X + b1), W1 is the weight (also called the connection coefficient), b1 is the offset, and function f can be a common sigmoid function or tanh function:




What is the relationship between the output layer and the hidden layer? In fact, the hidden layer to the output layer can be seen as a multi-class logical regression, that is, softmax regression. Therefore, the output of the output layer is softmax (W2X1 + b2 ), x1 indicates the Output f (W1X + b1) of the hidden layer ).


The entire MLP model is like this. The above three-layer MLP is summed up by the formula, that is, the function G is softmax.



Therefore, all MLP parameters are the connection weights and offsets between layers, including W1, b1, W2, and b2. For a specific question, how can we determine these parameters? The optimal parameter is an optimization problem. To solve the optimization problem, the simplest is the gradient descent method (SGD): first, all parameters are randomly initialized and then trained iteratively, constantly calculate the gradient and update parameters until a condition is met (for example, the error is small enough and the number of iterations is large enough ). This process involves the price function, Regularization, learning rate, and gradient calculation. This article does not discuss it in detail. You can refer to the two links at the top of this article.


After learning about the basic MLP model, go to the code implementation section.



Ii. Multi-layer sensor (MLP) code explanation (based on python + theano)
Again, the Code comes from: Multilayer Perceptron. This article only provides a detailed explanation. If there is an error, please do not parse it.
This code implements a layer-3 sensor, but after understanding the code, implementing n-layer sensor is not a problem, so you only need to understand the layer-3 MLP model. In summary, MLP input layer X is actually our training data, so the input layer does not need to be implemented. The rest is the "input layer to hidden layer" and "hidden layer to output layer. As mentioned above, "the input layer to the hidden layer" is a fully connected layer. In the following code, we define this part as a HiddenLayer. "Hidden layer to output layer" is a classifier softmax regression (also called logical regression). In the following code, we define this Part as LogisticRegression.
Code details:
(1) import necessary python modules

It is mainly numpy, theano, And the OS, sys, and time modules of python. The usage of these modules is shown in the following program.

import osimport sysimport timeimport numpyimport theanoimport theano.tensor as T


(2) define the MLP model (HiddenLayer + LogisticRegression)

This section defines the basic "component" of MLP, that is, the previously mentioned HiddenLayer and LogisticRegression

  • HiddenLayer
For the hidden layer, we need to define the connection coefficient W, offset B, input, and output. The specific code and interpretation are as follows:
Class HiddenLayer (object): def _ init _ (self, rng, input, n_in, n_out, W = None, B = None, activation = T. tanh): "" Note: This is a class that defines the hidden layer. First, it is clear that the input of the hidden layer is input, and the output is the number of neurons in the hidden layer. The input layer and the hidden layer are fully connected. Assume that the input is a n_in-dimension vector (or when n_in neurons), and the hidden layer has n_out neurons. Because the input is fully connected, there are n_in * n_out weights in total, therefore, the n_in (n_in, n_out) and n_in rows in the n_out column correspond to the connection weights of each neuron in the hidden layer. B is a bias, and the hidden layer has n_out neurons. Therefore, the n_out vector is used in B. Rng is the random number generator, numpy. random. RandomState, used to initialize W. All input used in the input training model is not the MLP input layer. When the number of neurons in the MLP input layer is n_in, the input parameter here is (n_example, n_in ), each row contains one sample, that is, each row serves as the input layer of MLP. Activation: The activation function, which is defined as the tanh "self. input = input # the input of the HiddenLayer class is the input passed in "" NOTE: to be compatible with the GPU, W and B must use dtype = theano. config. floatX and defined as theano. in addition, W initialization has a rule: if the tanh function is used, then in-sqrt (6. /(n_in + n_hidden) to sqrt (6. /(n_in + n_hidden) evenly extract values to initialize W. If the sigmoid function is used, multiply the value by four times. "# If W is not initialized, initialize it according to the above method. # The reason for adding this judgment is: Sometimes we can use trained parameters to initialize W. See my previous article. If W is None: W_values = numpy. asarray (rng. uniform (low =-numpy. sqrt (6. /(n_in + n_out), high = numpy. sqrt (6. /(n_in + n_out), size = (n_in, n_out), dtype = theano. config. floatX) if activation = theano. tensor. nnet. sigmoid: W_values * = 4 W = theano. shared (value = W_values, name = 'w', borrow = True) if B is None: B _values = numpy. zeros (n_out,), dtype = theano. config. floatX) B = theano. shared (value = B _values, name = 'B', borrow = True) # Use the W and B defined above to initialize W and B self of HiddenLayer. W = W self. B = B # output lin_output of the hidden layer = T. dot (input, self. w) + self. B self. output = (lin_output if activation is None else activation (lin_output) # The hidden layer parameter self. params = [self. w, self. b]


  • LogisticRegression

The logic regression (softmax regression) code is described as follows.

(For details about softmax regression, see DeepLearning tutorial (1) about Softmax regression principles + code details)


"Defines the classification layer. In deeplearning tutorial, Softmax returns LogisticRegression as Softmax, the second type of logical regression we know is the LogisticRegression "When n_out = 2 # parameter description: # input, the size is (n_example, n_in ), n_example is the size of a batch. # Because we use Minibatch SGD during training, input is defined as # n_in, that is, output of the previous layer (hidden layer) # n_out, number of output classes class LogisticRegression (object): def _ init _ (self, input, n_in, n_out): # W is the n_in row n_out column, B is the n_out dimension vector. That is, each output corresponds to one column of W and one element of B. Self. W = theano. shared (value = numpy. zeros (n_in, n_out), dtype = theano. config. floatX), name = 'w', borrow = True) self. B = theano. shared (value = numpy. zeros (n_out,), dtype = theano. config. floatX), name = 'B', borrow = True) # input is (n_example, n_in), W is (n_in, n_out), and point multiplication is obtained (n_example, n_out ), add the offset B, # and use it as T. nnet. softmax input, get p_y_given_x # So every row of p_y_given_x indicates that each sample is estimated to be of various probabilities # PS: B is the n_out dimension vector, and is added to the (n_example, n_out) matrix, internal replication is actually performed first. N_example B, # Then (n_example, n_out) each row of the matrix is added with B self. p_y_given_x = T. nnet. softmax (T. dot (input, self. w) + self. b) # argmax returns the maximum subscript, because the dataset in this example is MNIST, And the subscript is exactly the category. Axis = 1 indicates the operation by row. Self. y_pred = T. argmax (self. p_y_given_x, axis = 1) # params, the parameter self. params of LogisticRegression = [self. W, self. B]


OK! These two basic "components" are ready, and now we can "Assemble" them together.

If we want a layer-3 MLP, we only need HiddenLayer + LogisticRegression,

If you want a layer-4 MLP, It is HiddenLayer + LogisticRegression ...... and so on.

Below is the layer-3 MLP:


# Layer 3 MLPclass MLP (object): def _ init _ (self, rng, input, n_in, n_hidden, n_out): self. hiddenLayer = HiddenLayer (rng = rng, input = input, n_in = n_in, n_out = n_hidden, activation = T. tanh) # Use the output of the hidden layer hiddenLayer as the input of the logRegressionLayer of the classification layer, and connect them to self. logRegressionLayer = LogisticRegression (input = self. hiddenLayer. output, n_in = n_hidden, n_out = n_out) # The basic structure of MLP has been defined. The following are other parameters or functions of the MLP model # normalization items: Common L1 and L2_sqr self. l1 = (abs (self. hiddenLayer. W ). sum () + abs (self. logRegressionLayer. W ). sum () self. l2_sqr = (self. hiddenLayer. W ** 2 ). sum () + (self. logRegressionLayer. W ** 2 ). sum () # loss function Nll (also called cost function) self. negative_log_likelihood = (self. logRegressionLayer. negative_log_likelihood) # error self. errors = self. logRegressionLayer. errors # MLP parameter self. params = self. hiddenLayer. params + self. logRegressionLayer. params # end-snippet-3

In addition to the hidden layer and classification layer, the MLP class also defines loss functions and normalization items, which are used in solving optimization algorithms.



(3) apply MLP to MNIST (Handwritten Digit Recognition), define a layer-3 MLP, and then use it to classify the MNIST dataset. MNIST is a handwritten digit 0 ~ 9.
First, define the load_data () function for loading data mnist.pkl.gz ():
"Load MNIST dataset" def load_data (dataset): # dataset is the path of the dataset. The program first checks whether there is a MNIST dataset under this path, if not, download the MNIST dataset #, which is irrelevant to the softmax regression algorithm. Data_dir, data_file = OS. path. split (dataset) if data_dir = "" and not OS. path. isfile (dataset): # Check if dataset is in the data directory. new_path = OS. path. join (OS. path. split (_ file _) [0], ".. "," data ", dataset) if OS. path. isfile (new_path) or data_file = 'mnist.pkl.gz ': dataset = new_path if (not OS. path. isfile (dataset) and data_file = 'mnist.pkl.gz ': import urllib origin = ('HTTP: // www. Iro. umontreal. ca /~ Lisa/deep/data/mnist/mnist.pkl.gz ') print 'downloading data from % s' % origin urllib. urlretrieve (origin, dataset) print '... loading data'uploads is used to detect and download the mnist.pkl.gz data set, which is not the focus of this article. The following is the start of load_data # Load train_set, valid_set, and test_set from "mnist.pkl.gz". They all contain labels # mainly used gzip in python. open () function, and cPickle. load (). # 'Rb' indicates opening the file in binary readable mode f = gzip. open (dataset, 'rb') train_set, valid_set, test_set = cPickle. load (f) f. close () # Set the data to shared variables, mainly for GPU acceleration. Only shared variables can be saved to the GPU memory # the data type in the GPU can only be float. Data_y is a category, so the result is converted to int to return def shared_dataset (data_xy, borrow = True): data_x, data_y = data_xy shared_x = theano. shared (numpy. asarray (data_x, dtype = theano. config. floatX), borrow = borrow) shared_y = theano. shared (numpy. asarray (data_y, dtype = theano. config. floatX), borrow = borrow) return shared_x, T. cast (shared_y, 'int32 ') test_set_x, shard = shared_dataset (test_set) valid_set_x, shard = shared_dataset (valid_set) partition, shard = shared_dataset (train_set) rval = [(partition, train_set_y), (valid_set_x, valid_set_y), (test_set_x, test_set_y)] return rval


After loading the data, you can start training this model. The following is the main function test_mlp (), which is used on MNIST:
# Test_mlp is an application instance that uses gradient descent to optimize MLP. For MNIST dataset def test_mlp (learning_rate = 0.01, L1_reg = 0.00, L2_reg = 0.0001, n_epochs = 10, dataset1_'mnist.pkl.gz ', batch_size = 20, n_hidden = 500): "" NOTE: learning_rate, Coefficient before gradient. L1_reg and L2_reg: Coefficients before normalization items, weigh the specific gravity cost function of normalization items and Nll items = Nll + L1_reg * L1 or L2_reg * L2_sqrn_epochs: maximum number of iterations (that is, the number of training steps), used to end the optimization process dataset: the path of the training data n_hidden: number of hidden layer neurons batch_size = 20, that is, after 20 samples are trained, the gradient is calculated and the parameter "" # is updated to load the dataset, which is divided into the training set, verification set, and test set. Datasets = load_data (dataset) train_set_x, train_set_y = datasets [0] valid_set_x, valid_set_y = datasets [1] test_set_x, test_set_y = datasets [2] # shape [0, A row represents a sample, so the number of samples is obtained. Divided by batch_size, the number of batch n_train_batches = train_set_x.get_value (borrow = True) can be obtained ). shape [0]/batch_size n_valid_batches = valid_set_x.get_value (borrow = True ). shape [0]/batch_size n_test_batches = test_set_x.get_value (borrow = Tru E ). shape [0]/batch_size ###################### build actual model ######## ############## print '... building the model '# index indicates the subscript of batch, scalar # x indicates the dataset # y indicates the category, and one-dimensional vector index = T. lscalar () x = T. matrix ('x') y = T. ivector ('y') rng = numpy. random. randomState (1234) # generate a MLP named classifier = MLP (rng = rng, input = x, n_in = 28*28, n_hidden = n_hidden, n_out = 10) # cost function, with normalization items # initialize with y, but there is also an implicit parameter x in classifier Cost = (classifier. negative_log_likelihood (y) + L1_reg * classifier. l1 + L2_reg * classifier. l2_sqr) # The function of theano must be described here. givens is a dictionary, where x and y are keys, and their values are followed by colons. # When a function is called, x and y will be specifically replaced with their values, and the index parameter in value is provided here as inputs = [index. # Example: # For example, test_model (1), first convert x to test_set_x [1 * batch_size: (1 + 1) * batch_size] According to index = 1. # convert y to test_set_y [1 * batch_size: (1 + 1) * batch_size]. Then the function compute outputs = classifier. errors (y), # Here there are parameters y and implicit x, so the specific x and y in givens are passed in. Test_model = theano. function (inputs = [index], outputs = classifier. errors (y), givens = {x: test_set_x [index * batch_size :( index + 1) * batch_size], y: test_set_y [index * batch_size :( index + 1) * batch_size]}) validate_model = theano. function (inputs = [index], outputs = classifier. errors (y), givens = {x: valid_set_x [index * batch_size :( index + 1) * batch_size], y: valid_set_y [index * batch_size :( index + 1) * batch_size]}) # partial derivative value of the cost function for each parameter, that is, gradient, stored in gparams = [T. grad (cost, param) for param in classifier. params] # parameter update rules # updates [(), (), ()...], each parameter is enclosed in the brackets (param, param-learning_rate * gparam), that is, each parameter and its update formula updates = [(param, param-learning_rate * gparam) for param, gparam in zip (classifier. params, gparams)] train_model = theano. function (inputs = [index], outputs = cost, updates = updates, givens = {X: train_set_x [index * batch_size: (index + 1) * batch_size], y: train_set_y [index * batch_size: (index + 1) * batch_size]}) ############### start training model ################# print '... training 'patience = 10000 patience_increase = 2 # increase threshold, when the verification error is reduced to 0.995 times before, best_validation_loss improvement_threshold = 0.995 # This setting of validation_frequency ensures that each epoch will be tested on the verification set. Validation_frequency = min (n_train_batches, patience/2) best_validation_loss = numpy. inf best_iter = 0 test_score = 0. start_time = time. clock () # epoch indicates the number of training steps. Each epoch traverses all training data epoch = 0 done_looping = False # The following is the training process. while the number of epoch in a loop is, an epoch traverses all batchs, that is, all images. # The for loop is to traverse one batch and train one batch at a time. The for loop body uses train_model (minibatch_index) to train the model. The updatas in # train_model updates each parameter. # In the for loop, the number of trained batchs is accumulated. When the iter is a multiple of validation_frequency, It is tested on the verification set. # If the loss of the verification set is greater than the loss of this_validation_loss, # best_validation_loss and best_iter are updated and tested on testset. # If the loss of the verification set this_validation_loss is smaller than best_validation_loss * improvement_threshold, the patience is updated. # When the maximum number of steps n_epoch is reached, or when patience is <iter, the training while (epoch <n_epochs) and (not done_looping): epoch = epoch + 1 for minibatch_index in xrange (n_train_batches): # minibatch_avg_cost = train_model (minibatch_index) for one batch during training # number of trained minibatch, that is, number of iterations iter = (epoch-1) * n_train_batches + minibatch_index # if (iter + 1) % validation_frequency = 0, the number of trained minibatch instances is a multiple of validation_frequency: # compute zero-one loss on validation set validation_losses = [validate_model (I) for I in xrange (n_valid_batches)] this_validation_loss = numpy. mean (validation_losses) print ('epoch % I, minibatch % I/% I, validation error % f % '% (epoch, minibatch_index + 1, n_train_batches, this_validation_loss * 100 .)) # if the current verification error is smaller than the previous one, update best_validation_loss and the corresponding best_iter, and perform test if verification on tsetdata <if: if (this_validation_loss <best_validation_loss * handle ): patience = max (patience, iter * patience_increase) Percentage = describest_iter = iter test_losses = [test_model (I) for I in xrange (n_test_batches)] test_score = numpy. mean (test_losses) print ('epoch % I, minibatch % I/% I, test error of ''best model % f % ') % (epoch, minibatch_index + 1, n_train_batches, test_score * 100 .)) # if patience is less than or equal to iter, the training is terminated if patience <= iter: done_looping = True break end_time = time. clock () print ('optimization complete. best validation score of % f % ''obtained at iteration % I, with test performance % f %'') % (best_validation_loss * 100 ., best_iter + 1, test_score * 100 .)) print> sys. stderr, ('the code for file' + OS. path. split (_ file _) [1] + 'ran for %. 2fm '% (end_time-start_time)/60 .))


After the article is completed, the code that has been commented out in detail is put on my github address and can be downloaded. If you have any errors or are unclear, please leave a comment.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.