The previous blog introduced the use of the logistic regression to achieve kaggle handwriting recognition, this blog continues to introduce the use of multilayer perceptron to achieve handwriting recognition, and improve the accuracy rate. After I finished my last blog, I went to see some reptiles (not yet finished), so I had this blog after 40 days.
Here, pandas is used to read the CSV file, the function is as follows. We used the first 8 parts of Train.csv to do training set, Nineth to do validation set, tenth to do testing set.
def load_data (path): Print (' ... loading data ') TRAIN_DF = Pandas. Dataframe.from_csv (path+ ' train.csv ', Index_col=false). Fillna (0). Astype (int) test_df = pandas. Dataframe.from_csv (path+ ' test.csv ', Index_col=false). Fillna (0). Astype (int) def shared_dataset (Data_xy, Borrow=True ): "" "Function that loads the dataset into a shared variables the reason we store our datasets in shared VA
Riables is-to-allow Theano-to-copy it into the GPU memory (when code was run on the GPU). Since copying data into the GPUs are slow, copying a minibatch everytime is needed (the default behaviour if the DAT
A is not in a shared variable) would leads to a large decrease in performance. "" "data_x, data_y = Data_xy shared_x = theano.shared (Numpy.asarray (data_x, dtype=theano.config.floatx), b Orrow=borrow) shared_y = theano.shared (Numpy.asarray (data_y, Dtype=theano.config.floatx), Borrow=borrow) # When storing data on the GPU It has to be stored as floats # Therefore we'll store the labels as ' Floatx ' as well # (' Shared_y ' does exactly that). But during our computations # we need them as ints (we use labels as index, and if they is # floats it do ESN ' t make sense) therefore instead of returning # ' shared_y ' We'll have to cast it to Int. This little hack # lets OUs get around this issue return shared_x, T.cast (Shared_y, ' int32 ') train_se t = [train_df.values[0:33600, 1:]/255.0, train_df.values[0:33600, 0]] Valid_set = [train_df.values[33600:37800, 1:]/25 5.0, train_df.values[33600:37800, 0]] Test_set = [train_df.values[37800:, 1:]/255.0, train_df.values[37800:, 0]] p Redict_set = test_df.values/255.0
The
below defines a class that acts as a hidden layer for a multilayer perceptron.
Class Hiddenlayer (object): Def __init__ (self, rng, input, n_in, N_out, W=none, B=none, activation=t.t ANH): "" "typical hidden layer of a mlp:units is fully-connected and has sigmoid activation fun Ction.
Weight matrix W is of shape (n_in,n_out) and the bias vectors b is of shape (n_out,). Note:the nonlinearity used Here's Tanh Hidden unit activation is given By:tanh (dot (input,w) +b): type Rng:numpy.random.RandomState:p Aram rng:a random number generator used to initialize Weights:type input
: Theano.tensor.dmatrix:p Aram input:a symbolic tensor of shape (N_examples, n_in): Type N_in:int :p Aram N_in:dimensionality of Input:type N_out:int:p Aram N_out:number of hidden units:p Ara M W::p Aram B:: Type Activation:theano.
Op or function:p Aram Activation:non linearity to being applied in the hidden layer "" " Self.input = input # end-snippet-1 # ' W ' is initialized with ' w_values ' which are uniformely sample D # from sqrt ( -6./(N_in+n_hidden)) and sqrt (6./(N_in+n_hidden)) # for Tanh activation function # t He output of uniform if converted using Asarray to Dtype # Theano.config.floatX So, the code is runnable on GP U # Note:optimal Initialization of weights is dependent on the # activation function used (among OT
Her things). # For example, results presented in [XAVIER10] suggest so you # should use 4 times larger initial we Ights for sigmoid # compared to Tanh # we had no info for other function, so we use the same
As Tanh. If W is none:w_values = Numpy.asarray (Rng.uniform (Low=-numpy.sqrt (6./(n
_in+n_out)), High=numpy.sqrt (6./(n_in+n_out)), size= (n_in,n_out) ), dtype=theano.config.floatx) if activation = = Theano.tensor.nnet.s Igmoid:w_values *= 4 W = theano.shared (value=w_values, name= ' W ', borrow=true) if B I s none:b_values = Numpy.zeros ((n_out,), dtype=theano.config.floatx) b = theano.shared (value=b_val UEs, name= ' B ', borrow=true) self. w = w self.b = b lin_output = T.dot (input, self.
W) + self.b Self.output = (lin_output if activation is None else activation (lin_output) ) # Parameters of the Model self.params = [self. W, self.b]
The following is a class of multilayer perceptron, as follows:
Class MLP (object): "" "Multi-layer Perceptron Class A multilayer Perceptron is a feedforward artificial neural netw
Ork model that have one layer or more of hidden units and nonlinear activations. Intermediate layers usually has as activation function tanh or the sigmoid function (defined here by a ' hiddenlayer '
Class) While the top layer was a SOFTMAX layer (defined here by a ' logisticregression ' Class). "" "Def __init__ (self, rng, input, n_in, N_hidden, n_out):" "Initialize the parameters for the multilayer PE Rceptron:type rng:numpy.random.RandomState:p Aram rng:a random number generator used to initialize Weig
Hts:type input:theano.tensor.TensorType:p Aram Input:symbolic variable that describes the input of the Architecture (one Minibatch): Type N_in:int:p Aram N_in:number of input units, T He dimension of the space in which the datapoints lie:type N_hidden:int:p Aram N_hidden:number of Hidden Units:type n_out:int:p Aram N_out:number of output units, The dimension of the space in which the labels Lie "" "# Since We is dealing with a one hidden layer ML P, this would translate # into a hiddenlayer with a Tanh activation function connected to the # Logisticreg Ression layer; The activation function can be replaced by # sigmoid or any other nonlinear function Self.hiddenlayer = Hi Ddenlayer (Rng=rng, Input=input, n_in=n_in, N_out=n_hidden, Activation=t.tanh) # The Logisti C regression layer gets as input the Hiddenlayer units # of the hidden layer Self.logregressionlayer = Log Isticregression (Input=self.hiddenlayer.output, N_in=n_hidden, N_out=n_out) # end-snippet-2 S Tart-snippet-3 # L1 Norm; One regularization option is enforce L1 norm to being small self. L1 = (ABS (self.hiddenlayer).W). SUM () +abs (SELF.LOGREGRESSIONLAYER.W). sum ()) # Square of L2 norm; One regularization option is to enforce # Square of the L2 norm to being small self. L2_SQR = ((self.hiddenlayer.w**2). SUM () + (self.logregressionlayer.w**2). sum ()) # Negative log Likel Ihood of the MLP is given by the negative # log likelihood of the output of the model, computed in the # l
ogistic regression Layer # Self.negative_log_likelihood = (Self.logRegressionLayer.negative_log_likelihood)
# same holds for the function computing the number of errors # self.errors = self.logRegressionLayer.errors # The parameters of the model is the parameters of the "the" "layer it is made out of Self.params = s
Elf.hiddenLayer.params + self.logRegressionLayer.params # end-snippet-3 # keep track of model input Self.input = input def negative_log_likelihood (self, y): Return-t.mean (The T.log (self. logregressionlayer.p_y_given_x) [T.arange (Y.shape[0]), y]) def errors (self, y): if Y.ndim! = Self.logregressi
OnLayer.y_pred.ndim:raise TypeError (' Y should have the same shape as self.y_pred ',
(' Y ', y.type, ' y_pred ', Self.logRegressionLayer.y_pred.type)) # Check if y is of the correct datatype if y.dtype.startswith (' int '): # The T.NEQ operator returns a V ector of 0s and 1s, where a # represents a mistake in prediction return T.mean (T.NEQ (self.logregre ssionlayer.y_pred, y)) Else:raise notimplementederror () def __getstate__ (self): return s elf.__dict__
The following functions are used to train the model using a multilayer perceptron, as follows:
def TEST_MLP (learning_rate=0.0005, l1_reg=0.00, l2_reg=0.0001, n_epochs=100, Path=r ", Batch_size=20, N_hi dden=500): "" "demonstrate stochastic gradient descent optimization for a multilayer perceptron the IS Demons
Trated on MNIST. : Type learning_rate:float:p aram learning_rate:learning rate Used (factor for the stochastic gradient): type l1_ Reg:float:p Aram L1_reg:l1-norm ' s weight when added to the cost (see regularization): Type L2_reg:float:p A Ram L2_reg:l2-norm ' s weight when added to the cost (see regularization): Type N_epochs:int:p Aram N_epochs:maxi Mal number of epochs to run the Optimizer:type dataset:string:p Aram dataset:the Path of the MNIST dataset file From http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz:p Aram Batch_size::p Aram N_hidden:: Return: "" "Datasets = Load_data (path) # datasets = Read_raw_train (DataSet) train_set_x, Train_set_y = Datasets[0] valid_set_x, valid_set_y = datasets[1] test_set_x, test_set_y = datasets[2] # COMPUTE Number of minibatches for training, validation and testing n_train_batches = Train_set_x.get_value (borrow=true). shape[ 0]//batch_size n_valid_batches = Valid_set_x.get_value (borrow=true). shape[0]//batch_size n_test_batches = Test_set _x.get_value (borrow=true). Shape[0]//batch_size ###################### # BUILD ACTUAL MODEL # ################ ###### print (' ... building the Model ') # Allocate symbolic variables for the data index = t.lscalar () # Index to a [Mini]batch x = T.matrix (' x ') # The data is presented as rasterized images y = t.ivector (' y ') # the label
S is presented as 1D vector of [int] Labels rng = numpy.random.RandomState (1234) # construct the MLP class Classifier = MLP (rng=rng, Input=x, n_in=28*28, N_hidden=n_hidden, n_out=10) # start-snippet-4 # The cost we Minimi Ze during training is thE negative log likelihood of # The model plus the regularization terms (L1 and L2): Cost was expressed # here Symbo lically cost = (Classifier.negative_log_likelihood (y) + l1_reg*classifier. L1 + l2_reg*classifier. L2_SQR) # end-snippet-4 # compiling a Theano function that computes the mistakes that is made # by the
Model on a Minibatch Test_model = Theano.function (Inputs=[index], outputs=classifier.errors (y), givens={x:test_set_x[index*batch_size: (index+1) *batch_size], y:test_set_y[index*batch_size: (index+1) *batch_size]}) Validate_model = Theano.function (Inputs=[index], Outputs=clas Sifier.errors (y), givens={x:valid_set_x[index*batch_size: (index+1) *batch_size], Y:valid _set_y[index*batch_size: (index+1) *batch_size]}) # start-snippet-5 # Compute the gradient of cost WI th respect to theta (sorted in params) # The resulting gradients'll be stored in a list gparams gparams = [T.grad (cost, param) for param in Classifi
Er.params] # Specify how to update the parameters of the model as a list of # (variable, update expression) pairs # Given of lists of the same length, A=[A1, A2, A3, A4] and # b=[b1, B2, B3, B4], Zip generates A list C of Sam
E size, where each # element was a pair formed from the lists: # c=[(A1, B1), (A2, B2), (A3, B3), (A4, B4)]
# updates=[# (param, Param-learning_rate*gparam) # for Param, gparam in zip (Classifier.params, gparams) #] # using Rmsprop (scaling the gradient based on running average) # to update the parameters of the model
As a list of (variable,update expression) pairs def rmsprop (gparams, params, learning_rate, rho=0.9, epsilon=1e-6):
"" "Param:rho,the fraction We keep the previous gradient contribution" "Updates = [] For P, g in zip (paramS, gparams): acc = theano.shared (p.get_value () * 0.) Acc_new = Rho * acc + (1-RHO) * g * * 2 gradient_scaling = t.sqrt (acc_new + epsilon) g = G/grad Ient_scaling Updates.append ((ACC, acc_new)) Updates.append ((p, p-learning_rate * g) RET URN Updates # compiling a Theano function ' Train_model ' returns the cost, but # in the same time updates the parameter of the model based on the rules # defined in ' updates ' Train_model = Theano.function (inputs=[i
Ndex], Outputs=cost, Updates=rmsprop (Gparams, Classifier.params, learning_rate), givens={ X:train_set_x[index*batch_size: (index+1) *batch_size], Y:train_set_y[index*batch_size: (index+1) *batch_s Ize]}) # end-snippet-5 ############### # TRAIN MODEL # ############### print (' ... Traini Ng ') # early-stopping parameters patience = 10000 # lookAs this many examples regardless patience_increase = 2 # Wait The much longer when a new best is found Improvemen T_threshold = 0.995 # A relative improvement of this much is considered significant validation_frequency = min (n_train _batches, PATIENCE//2) # go through this many minibatche before checking the network # on the Vali Dation set;
In this case we check every epoch Best_validation_loss = Numpy.inf Best_iter = 0 Test_score = 0. Start_time = Timeit.default_timer () Epoch = 0 done_looping = False while (Epoch < n_epochs) and _looping): Epoch + = 1 if epoch > 60:learning_rate = 0.001 for minibatch_index in R
Ange (n_train_batches): Minibatch_avg_cost = Train_model (minibatch_index) # iteration Number
iter = (epoch-1) *n_train_batches + minibatch_index if (iter+1)%validation_frequency==0: # COMPUTE Zero-one LOSS on validation Set validation_losses = [Validate_model (i) for I in Range (n_valid_batches)] This_validation_loss = Numpy.mean (validation_losses) print (' Epoch%i, Minibatch%i/%i, validation Erro
R%f percent '% (epoch, minibatch_index+1, N_train_batches, this_validation_loss*100.))
# If we got best validation score until now if This_validation_loss < best_validation_loss: # Improve patience If loss improvement is good enough if (this_validation_loss<best_valid
Ation_loss*improvement_threshold): Patience = max (patience, Iter*patience_increase) Best_validation_loss = This_validation_loss Best_iter = iter # test it on T He test set test_losses = [Test_model (i) for I in Range (n_test_batches)] Test_scor E = Numpy.mean (Test_losses) print ((' Epoch%i, Minibatch%i/%i, test error ' best model '%f%
% ')% (epoch, minibatch_index+1, N_train_batches, test_score*100.)) # Save the best model with open (' mlp_best_model.pkl01 ', ' WB ') as F:pickle.dump (classifier, F) if patience <= iter:done_looping = True break End_tim E = Timeit.default_timer () print (' Optimization complete. Besat validation score of%f percent "obtained at iteration%i, with test performance%f%")% (Best_va
Lidation_loss*100., best_iter+1, test_score*100.)) Print (' The code for file ' + os.path.split (__file__) [1] + ' ran for%.2fm '% ((end_time-start_time)/60.)), FI LE=SYS.STDERR)
The following is a predictive function and saves the predicted results in mlp_best_model_answer.csv.
def predict_kaggle (File1, file2):
# Load the saved model
classifier = pickle.load (open (File1))
# Compile a PR Edictor function
Predict_model = theano.function (Inputs=[classifier.input], outputs= classifier.logRegressionLayer.y_pred,
allow_input_downcast=true)
# make prediction
Data_path = R ' e:\ Lab\digitrecognizer\test.csv '
datasets = Load_data (R ')
test_set_x = datasets[3]
test_set_x = test_set_ X.get_value ()
predicted_values = Predict_model (test_set_x[:28000])
# Save forecast Results
Saveresult (predicted_ Values, file2)
This time using a notebook with gtx960m to run the experiment, compared with the previous speed, is flying up. The correct rate of results after running is 97.486%. Below I will make the appropriate changes to try to get a higher accuracy rate.