Implementation of three kinds of cyclic neural network (RNN) algorithm (from scratch, Theano, Keras) _ Neural network

Source: Internet
Author: User
Tags abs flush theano keras

Preface body RNN from Scratch RNN using Theano RNN using Keras PostScript

"From simplicity to complexity, and then to Jane." "Foreword

Skip the nonsense and look directly at the text

After a period of study, I have a preliminary understanding of the basic principles of RNN and implementation methods, here are listed in three different RNN implementation methods for reference.

RNN principle in the Internet can find a lot, I do not say here, say it will not be better than those, here first recommend a RNN tutorial, speak very well, four post read basic can realize their own rnn. Body RNN from Scratch

Import NLTK import CSV import itertools import NumPy as NP from utils Import * import operator from datetime import Dateti Me import sys class rnnnumpy:def __init__ (self, Word_dim, hidden_dim=100, bptt_truncate=4): # Assign Instanc E variables Self.word_dim = Word_dim Self.hidden_dim = Hidden_dim self.bptt_truncate = Bptt_trunca TE # Randomly initialize the network parameters self. U = Np.random.uniform (-np.sqrt (1./word_dim), Np.sqrt (1./word_dim), (Hidden_dim, Word_dim)) self. V = Np.random.uniform (-np.sqrt (1./hidden_dim), Np.sqrt (1./hidden_dim), (Word_dim, Hidden_dim)) self. W = Np.random.uniform (-np.sqrt (1./hidden_dim), Np.sqrt (1./hidden_dim), (Hidden_dim, Hidden_dim) def Forward_propagat  Ion (self, x): # The total number of time steps T = Len (x) # During forward propagation we save all
        Hidden states in S because need them later. # We Add one additional element for the initial hidden,Which we set to 0 s = Np.zeros ((T + 1, Self.hidden_dim)) s[-1] = Np.zeros (self.hidden_dim) # The O Utputs at each time step.
        Again, we save them for later.  o = Np.zeros ((t, Self.word_dim)) # For each time step ... for T in Np.arange (t): # Note that we Are indxing U by x[t].
            This is the same as multiplying U with a one-hot vector. S[t] = Np.tanh (self. U[:,X[T]] + self. W.dot (S[t-1]) o[t] = Softmax (self. V.dot (S[t]) return [O, S] def predict (self, x): # Perform forward propagation and return index of th E Highest score O, s = self.forward_propagation (x) return Np.argmax (O, Axis=1) def Calculate_total_lo SS (self, x, y): L = 0 # For each sentence ... for i in Np.arange (Len (y)): o, s = self.f Orward_propagation (X[i]) # We only care about our prediction of the ' correct ' words correct_word_p Redictions = O[np.arange (lEn (Y[i])), Y[i]] # ADD to the loss based ' how off we were L + = 1 * np.sum (Np.log (correct_word_p redictions)) return L def calculate_loss (self, x, y): # Divide The total loss by the number of Traini ng examples N = Np.sum ((len (y_i) for y_i into Y)) return Self.calculate_total_loss (X,y)/n def bptt (self , X, y): T = Len (y) # Perform forward propagation o, s = self.forward_propagation (x) # We Accumulate the gradients in these variables Dldu = Np.zeros (self. U.shape) DLDV = Np.zeros (self. V.shape) DLDW = Np.zeros (self.
        W.shape) delta_o = O Delta_o[np.arange (len (y)), y]-= 1. # for each output backwards ... for t in Np.arange (t) [:: -1]: DLDV + = Np.outer (Delta_o[t], s[t]. T) # Initial Delta calculation delta_t = self. V.t.dot (Delta_o[t]) * (1-(s[t] * * 2)) # BackPropagation through time (for at most SELf.bptt_truncate steps) for Bptt_step in Np.arange (max (0, T-self.bptt_truncate), t+1) [::-1]: # Print "backpropagation step t=%d bptt step=%d"% (t, bptt_step) DLDW + = Np.outer (delta_t, s[bptt_step-1]
                ) Dldu[:,x[bptt_step]] + = delta_t # Update Delta for next step delta_t = self. W.t.dot (delta_t) * (1-s[bptt_step-1] * * 2) return [Dldu, DLDV, DLDW] def gradient_check (self, x, Y, h=0.001 , error_threshold=0.01): # Calculate the gradients using backpropagation.
        We want to Checker if these are correct.
        Bptt_gradients = Self.bptt (x, y) # List of all parameters we want to check. Model_parameters = [' U ', ' V ', ' W '] # gradient check for each parameter to PIDX, pname in enumerate (model_ Parameters): # Get the actual parameter value from the mode, e.g. model.
      W parameter = Operator.attrgetter (pname) (self)      print ' Performing gradient check for parameter%s with size%d% (PName, Np.prod (Parameter.shape)) # I Terate over each element of the parameter matrix, e.g (0,0), (0,1), ... it = np.nditer (parameter, flags=[' Mul
                Ti_index '], op_flags=[' ReadWrite ') while not It.finished:ix = It.multi_index # Save The original value so we can reset it later Original_value = Parameter[ix] # Estim  Ate the gradient using (f (x+h)-F (x-h))/(2*h) Parameter[ix] = original_value + H gradplus = Self.calculate_total_loss ([x],[y]) Parameter[ix] = original_value-h Gradminus = self. Calculate_total_loss ([x],[y]) estimated_gradient = (Gradplus-gradminus)/(2*h) # Reset PA  Rameter to original value parameter[ix] = original_value # The gradient for this parameter Calculated using BACKPROpagation backprop_gradient = Bptt_gradients[pidx][ix] # Calculate the relative error: (|x
                -y|/(|x| + |y|)) Relative_error = Np.abs (backprop_gradient-estimated_gradient)/(Np.abs (backprop_gradient) + np.abs (estimated_ Gradient)) # If the error is to large fail the gradient check If relative_error > Error _threshold:print "Gradient Check error:parameter=%s ix=%s"% (PName, ix) print  +h Loss:%f "% gradplus print"-H Loss:%f "% gradminus print" Estimated_gradient:
                    %f "% estimated_gradient print" backpropagation gradient:%f "% backprop_gradient Print "Relative Error:%f"% relative_error return It.iternext () print "G"
    Radient Check for parameter%s passed.% (pname) # performs one step of SGD. def sgd_step (self, x, Y, Learning_raTE): # Calculate The gradients Dldu, DLDV, DLDW = Self.bptt (x, y) # change parameters according to Gradients and learning rate self. U-= Learning_rate * Dldu self. V-= Learning_rate * DLDV self. W-= learning_rate * DLDW # Outer SGD Loop #-Model:the RNN Model Instance #-x_train:the training data SE  t #-Y_train:the Training Data Labels #-learning_rate:initial learning rate for SGD #-Nepoch:number of Times to iterate through the complete dataset #-Evaluate_loss_after:evaluate The loss on this many epochs D  EF TRAIN_WITH_SGD (self, x_train, Y_train, learning_rate=0.005, nepoch=100, evaluate_loss_after=5): # We Keep track Of the losses so we can plot them later losses = [] Num_examples_seen = 0 for epoch in range (Nepo CH): # Optionally evaluate the loss if (epoch% Evaluate_loss_after = 0): Loss = Self.calculate_loss (X_traiN, Y_train) losses.append ((Num_examples_seen, loss)) time = DateTime.Now (). Strftime ('%y-%m  -%d-%h-%m-%s ') print "%s:loss after num_examples_seen=%d epoch=%d:%f"% (time, Num_examples_seen, epoch, Loss) # Adjust The learning rate if loss increases if (Len (losses) > 1 and losses[-1][ 1] > Losses[-2][1]): learning_rate = learning_rate * 0.5 print "Setting Learni NG rate to%f "% learning_rate Sys.stdout.flush () # added! Saving model Oarameters save_model_parameters_numpy ("./data/rnn-numpy-%d-%d-%s.npz"% (Self.hidden_dim, SE
                Lf.word_dim, time), self] # for each training example ... for the I in Range (len (y_train)):  # One SGD step Self.sgd_step (X_train[i], y_train[i], learning_rate) num_examples_seen + 1

More code reference GitHub RNN Using Theano

Import NumPy as NP import Theano as Theano import theano.tensor as T from utils import * import operator from datetime IMP ORT datetime import SYS class rnntheano:def __init__ (self, Word_dim, hidden_dim=100, bptt_truncate=4): # AS  Sign instance variables Self.word_dim = Word_dim Self.hidden_dim = Hidden_dim Self.bptt_truncate = Bptt_truncate # Randomly initialize the network parameters U = Np.random.uniform (-np.sqrt (1./word_dim), n P.sqrt (1./word_dim), (Hidden_dim, Word_dim)) V = Np.random.uniform (-np.sqrt (1./hidden_dim), np.sqrt (1./hidden_dim) , (Word_dim, Hidden_dim)) W = Np.random.uniform (-np.sqrt (1./hidden_dim), Np.sqrt (1./hidden_dim), (Hidden_dim, Hidd En_dim)) # theano:created shared variables self. U = theano.shared (name= ' U ', Value=u.astype (Theano.config.floatX)) self. V = theano.shared (name= ' V ', Value=v.astype (Theano.config.floatX)) self. W = theano.shared (name= ' W ', Value=w.astype (thEano.config.floatX)) # We Store the Theano graph here Self.theano = {} self.__theano_build__ () def __theano_build__ (self): U, V, W = self. U, self. V, self.
            W x = T.ivector (' x ') y = t.ivector (' y ') def forward_prop_step (x_t, S_t_prev, U, V, W):
        s_t = T.tanh (u[:,x_t] + w.dot (s_t_prev)) o_t = T.nnet.softmax (V.dot (s_t)) return [o_t[0], s_t] [O,s], updates = Theano.scan (Forward_prop_step, Sequences=x, Outputs_info=[none , Dict (Initial=t.zeros (Self.hidden_dim))], Non_sequences=[u, V, W], truncate_gradient=self.bptt_tr Uncate, strict=true) prediction = T.argmax (o, axis=1) o_error = T.sum (T.nnet.categorical_cros Sentropy (o, y)) # Gradients DU = T.grad (O_error, U) DV = T.grad (O_error, V) DW = T.grad (o _error, W) # Assign functions Self.forward_propagation = Theano.function ([x], O) self.predict = Theano.function ([x], prediction) Self.ce_error = Theano.function ([x, y], o_error) SELF.BPTT = Theano.function ([x, y], [DU, DV, DW]) # SGD learning_rate = T.scala R (' learning_rate ') self.sgd_step = Theano.function ([X,y,learning_rate], [], updates=[(self. U, self. U-learning_rate * DU), (self. V, self. V-learning_rate * DV), (self. W, self. W-learning_rate * DW)] def calculate_total_loss (self, X, Y): Return Np.sum ([Self.ce_error (X,y) for x,y in  Zip (x,y)]) def calculate_loss (self, X, Y): # Divide Calculate_loss by the number of words num_words = Np.sum ([Len (y) for y on y]) return Self.calculate_total_loss (x,y)/float (num_words) def train_with_sgd (self, X_train, Y_train, learning_rate=0.005, Nepoch=1, evaluate_loss_after=5): # We keep track of the losses so We can pLot them later losses = [] Num_examples_seen = 0 for epoch in range (Nepoch): # Optiona Lly Evaluate the loss if (epoch% Evaluate_loss_after = 0): Loss = Self.calculate_loss (X_trai N, Y_train) losses.append ((Num_examples_seen, loss)) time = DateTime.Now (). Strftime ('%y-%m  -%d-%h-%m-%s ') print "%s:loss after num_examples_seen=%d epoch=%d:%f"% (time, Num_examples_seen, epoch, Loss) # Adjust The learning rate if loss increases if (Len (losses) > 1 and losses[-1][ 1] > Losses[-2][1]): learning_rate = learning_rate * 0.5 print "Setting learn ing rate to%f% learning_rate Sys.stdout.flush () # added! Saving model Oarameters Save_model_parameters_theano ("./data/rnn-theano-%d-%d-%s.npz"% (Self.hidden_dim, Self.word_dim, time), self] # for each TRAining example. For I in range (len (Y_train)): # One SGD step Self.sgd_step (x_ Train[i], Y_train[i], learning_rate) Num_examples_seen = 1 def gradient_check_theano (model, x, Y, h=0.0 error_threshold=0.01): # Overwrite the bpTT attribute. We need to backpropagate all of the way to get the correct gradient model.bptt_truncate = 1000 # Calculate the Gradie
    NTS using Backprop bptt_gradients = Model.bptt (x, y) # List of all parameters we want to CHEC. Model_parameters = [' U ', ' V ', ' W '] # gradient check for each parameter to PIDX, pname in enumerate (model_paramete RS): # Get the actual parameter value from the mode, e.g. model. W parameter_t = Operator.attrgetter (pname) (model) parameter = Parameter_t.get_value () print "Perfo Rming gradient Check for parameter%s with size%d% (PName, Np.prod (Parameter.shape)) # iterate over each eleme NT to the parameter matrix, e.g. (0,0), (0,1), ... it = np.nditer (parameter, flags=[' Multi_index '], op_flags=[' ReadWrite ']) while no
            T it.finished:ix = it.multi_index # Save The original value so we can reset it later  Original_value = Parameter[ix] # estimate the gradient using (f (x+h)-F (x-h))/(2*H) Parameter[ix] = Original_value + H parameter_t.set_value (parameter) Gradplus = Model.calculate_total_loss ([x],[ Y]) Parameter[ix] = original_value-h parameter_t.set_value (parameter) Gradminus = Mo Del.calculate_total_loss ([x],[y]) estimated_gradient = (Gradplus-gradminus)/(2*H) Parameter[ix] = Original_value Parameter_t.set_value (parameter) # The gradient for this parameter calculated USI ng backpropagation backprop_gradient = bptt_gradients[pidx][ix] # Calculate the relative error: (|
    x-y|/(|x| + |y|))        Relative_error = Np.abs (backprop_gradient-estimated_gradient)/(Np.abs (backprop_gradient) + np.abs (Estimated_grad
                ient)) # If the error is to large fail the gradient check If relative_error > Error_threshold: Print "Gradient Check error:parameter=%s ix=%s"% (PName, ix) print "+h Loss:%f"% GRADP
                LUs print "-H Loss:%f"% gradminus print "estimated_gradient:%f"% estimated_gradient Print "BackPropagation gradient:%f"% backprop_gradient print "relative Error:%f"% Relati Ve_error return It.iternext () print gradient check for parameter%s passed. "% (Pnam E

More code reference GitHub
Another: GRU version of the Theano code reference GitHub RNN Using Keras

From __future__ import print_function to keras.models import sequential from keras.layers import dense, activation, Drop Out from keras.layers import lstm to keras.optimizers import rmsprop from keras.utils.data_utils import get_file Import NumPy as NP import random import sys class rnnkeras:def __init__ (self, Sentencelen, vector_size, Output_size, Hidde n_dim=100): # Assign Instance Variables Self.sentencelen = Sentencelen Self.vector_size = vector_s ize self.output_size = output_size Self.hidden_dim = Hidden_dim self.__model_build__ () def _ _model_build__ (self): Self.model = sequential () Self.model.add lstm (Self.output_size, input_shape= (self.se Ntencelen, Self.vector_size)) Self.model.add (dense (self.vector_size))) Self.model.add (Activation (' Softmax 

    ') optimizer = Rmsprop (lr=0.01) self.model.compile (loss= ' categorical_crossentropy ', Optimizer=optimizer) def train_model (SELF, x, y, batchsize=128, nepoch=1): Self.model.fit (x, Y, Batch_size=batchsize, Nb_epoch=nepoch) def predict (s Elf, X): Return model.predict (x, verbose=0) [0]

More Code reference GitHub PostScript

In recent years, the research and application of deep learning has become more and more hot, with the advent of CNN and RNN, there are fewer people studying DBN and SAE. But to use a good neural network, to understand DBN and SAE is necessary, I also have to take time to learn CNN, there are times to finish this article, add point description text.

In addition, do not start directly with Keras these packaged libraries, but first to understand the RNN bottom of the principle and calculation formula, so that RNN can grasp more thoroughly. And these packaging libraries are not omnipotent, when the model is more complex, some functions through these highly encapsulated library is no way to achieve, or through Theano or tensorflow their own implementation.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.