@author: Huangyongye
@creat_date: 2017-03-09
Preface: According to my own learning TensorFlow realize lstm experience, found that although there are many tutorials on the internet, many of which are based on the official examples, using multi-layer lstm to achieve Ptbmodel language model, such as:
TensorFlow notes: Multi-layer LSTM code Analysis
But the feeling of these examples is still too complex, so here is a relatively simple version, although not elegant, but it is relatively easy to understand.
If you want to understand the principle of lstm (if you already understand the principle of ordinary RNN), you can refer to my previous translation of the blog:
Understanding Lstm Network (Understanding Lstm Networks by Colah)
If you want to know the RNN principle, you can refer to the AK blog:
The unreasonable effectiveness of recurrent neural Networks
Many friends mentioned multilayer how to understand, so they made a schematic, hoping to help beginners better understanding of multilayer rnn.
Figure 1 3-layer RNN by Time step
This example does not speak the principle. Through this example, you can understand the implementation of Single-layer lstm, multi-layer lstm. Enter the format of the output data. The realization of RNN's dropout layer.
#-*-Coding:utf-8-*-
import tensorflow as tf
import NumPy as NP from
tensorflow.contrib import rnn
from Tensorflow.examples.tutorials.mnist Import input_data
# set GPU on-demand growth
config = tf. Configproto ()
config.gpu_options.allow_growth = True
sess = tf. Session (Config=config)
# First import the data, look at the form of the data
mnist = input_data.read_data_sets (' Mnist_data ', one_hot=true)
Print Mnist.train.images.shape
1 2 3 4 5 6 7 8 9 10 11 12 13-14
Extracting mnist_data/train-images-idx3-ubyte.gz
extracting mnist_data/train-labels-idx1-ubyte.gz
Extracting mnist_data/t10k-images-idx3-ubyte.gz
extracting mnist_data/t10k-labels-idx1-ubyte.gz
(55000, 784)
1 2 3 4 5 6
1. Set up all the parameters used in the model first
LR = 1e-3
# In training and testing, we want to use different batch_size. So the way of
batch_size = Tf.placeholder (tf.int32) # Note type must be Tf.int32
# batch_size = 128
# The input feature for each moment is 28-D, which is to enter one row at a time, one row has 28 pixels input_size = The duration of the time
series is 28, that is, every time a prediction is made, Need to enter 28 lines
timestep_size =
# Number of nodes per hidden layer
hidden_size = 256
# lstm layer layer
layer_num = 2
# last Lost The number of classified categories, if it is regression prediction should be 1
class_num = ten
_x = Tf.placeholder (Tf.float32, [None, 784])
y = Tf.placeholder ( Tf.float32, [None, Class_num])
Keep_prob = Tf.placeholder (Tf.float32)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
2. Start to build the LSTM model, in fact, the common Rnns model is the same
# Restore 784-point character information to 28 * 28 Pictures # The following steps are key to achieving rnn/lstm ################################################################## # # * * * Step 1:rnn input shape = (batch_size, timestep_size, input_size) X = Tf.reshape (_x, [-1, 28, 28]) # * * * Step 2: Define a layer of lstm_cel L, only need to explain hidden_size, it will automatically match the dimension of the input X Lstm_cell = rnn. Basiclstmcell (Num_units=hidden_size, forget_bias=1.0, state_is_tuple=true) # * * * Step 3: Add dropout layer, general only set Output_keep _prob Lstm_cell = rnn.
Dropoutwrapper (Cell=lstm_cell, input_keep_prob=1.0, Output_keep_prob=keep_prob) # * * Step 4: Call Multirnncell to implement multi-tier lstm Mlstm_cell = Rnn. Multirnncell ([Lstm_cell] * layer_num, state_is_tuple=true) # * * * Step 5: Initialize state zero = Init_state with full mlstm_cell.zero_state ( Batch_size, Dtype=tf.float32) # * * Step 6: Method One, call DYNAMIC_RNN () to let us build a good network run # * * * * when time_major==false, outputs.shape = [ba Tch_size, Timestep_size, hidden_size] # * * So, can take h_state = outputs[:,-1,:] As the last output # * * State.shape = [Layer_num, 2, BA Tch_size, Hidden_size], # * * Or, can take h_state = state[-1][1] As the last output #* * The final output dimension is [Batch_size, hidden_size] # outputs, state = Tf.nn.dynamic_rnn (Mlstm_cell, Inputs=x, initial_state=init_state , time_major=false) # h_state = outputs[:,-1,:] # or h_state = state[-1][1] # *************** in order to better understand the working principle of lstm, we put the above
The function in step 6 implements *************** # by looking at the document you will find that Rnncell provides a __call__ (see the last attached) function that we can use to expand the implementation lstm iteration by time. # * * Step 6: Method two, outputs = list () state = Init_state with Tf.variable_scope (' RNN ') by Time step: for Timestep in range (timestep _size): If Timestep > 0:tf.get_variable_scope (). Reuse_variables () # The state here preserves every layer of lstm. states = Mlstm_cell (x[:, Timestep,:], State) Outputs.append (cell_output) h_state = output S[-1]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
3. Set up loss function and optimizer, start training and complete testThe following sections are in fact the same as those previously written in the TensorFlow (iii) Multi-layer CNNS implementation Mnist classification.
# The output of the Lstm section above will be a [hidden_size] tensor, and we need to sort it with a Softmax layer # first define the connection weight matrix of Softmax and offset # out_w = Tf.placeholder (tf.f Loat32, [Hidden_size, Class_num], name= ' out_weights ') # Out_bias = Tf.placeholder (Tf.float32, [Class_num], name= ' Out_ Bias ') # Start training and testing W = tf. Variable (Tf.truncated_normal ([Hidden_size, Class_num], stddev=0.1), dtype=tf.float32) bias = tf. Variable (Tf.constant (0.1,shape=[class_num]), dtype=tf.float32) Y_pre = Tf.nn.softmax (Tf.matmul (h_state, W) + bias) # loss
and evaluation function cross_entropy =-tf.reduce_mean (Y * tf.log (y_pre)) Train_op = Tf.train.AdamOptimizer (LR). Minimize (Cross_entropy) Correct_prediction = Tf.equal (Tf.argmax (y_pre,1), Tf.argmax (y,1)) accuracy = Tf.reduce_mean (Tf.cast (correct_ Prediction, "float") Sess.run (Tf.global_variables_initializer ()) for I in range: _batch_size = 128 Batch
= Mnist.train.next_batch (_batch_size) if (i+1)%200 = = 0:train_accuracy = Sess.run (accuracy, feed_dict={ _x:batch[0], y:batch[1], keep_prob:1.0, Batch_size: _batch_size}) # Number of epoch completed by iteration: mnist.train.epochs_completed print ' iter%d, step%d, trainin G accuracy%g "% (mnist.train.epochs_completed, (i+1), train_accuracy) Sess.run (Train_op, feed_dict={_x:batch[0], y: Batch[1], keep_prob:0.5, batch_size: _batch_size}) # Calculate the accuracy of test data print "Test accuracy%g"% sess.run (accuracy, feed_dict=
{_x:mnist.test.images, y:mnist.test.labels, keep_prob:1.0, Batch_size:mnist.test.images.shape[0]}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
Iter0, step, training accuracy 0.851562
Iter0, step, training accuracy-0.960938 Iter1
, step, training Accuracy 0.984375
Iter1, step A, training accuracy 0.960938 Iter2
, step 1000, training accuracy, 0.984375
It Er2, step 1200, training accuracy 0.9375 Iter3, step 1400, training-accuracy 0.96875 Iter3
, step 1600, training AC Curacy 0.984375
Iter4, step 1800, training accuracy 0.992188
-Iter4, step, training, accuracy 0.984375 tes
T accuracy 0.9858
1 2 3 4 5 6 7 8 9 10 11-12