Welcome reprint, but please be sure to indicate the source and author information.
According to my own learning TensorFlow realize LSTM experience, found that although there are many tutorials on the internet, many of them are based on the official examples, using multilayer LSTM to achieve Ptbmodel language model, such as:
TensorFlow notes: Multilayer LSTM Code Analysis
But the feeling of these examples is still too complex, so here is a relatively simple version, although not elegant, but it is relatively easy to understand.
If you want to understand the principles of LSTM (provided you understand the principles of common RNN), you can refer to the blog I translated earlier:
Understanding LSTM Network (Understanding LSTM Networks by Colah)
Many friends mentioned multilayer how to understand, so they made a schematic, hoping to help beginners to better understand multilayer rnn.
This example does not speak the principle. Through this example, you can understand the implementation of single-layer LSTM, multilayer LSTM implementation. The format of the input output data. The implementation of the dropout layer of the RNN.
# Restore 784 dots of character information to 28 * 28 Pictures # The following steps are key to achieving rnn/lstm ################################################################## # # # * * Step 1:rnn input shape = (batch_size, timestep_size, input_size) X = Tf.reshape (_x, [-1, 28, 28]) # * * Step 2: Define a layer of lstm_cel L, just need to explain hidden_size, it will automatically match the dimensions of the input X Lstm_cell = rnn. Basiclstmcell (Num_units=hidden_size, forget_bias=1.0, state_is_tuple=true) # * * Step 3: Add dropout layer, generally only set output_keep _prob Lstm_cell = rnn.
Dropoutwrapper (Cell=lstm_cell, input_keep_prob=1.0, Output_keep_prob=keep_prob) # * * Step 4: Call Multirnncell to implement multilayer lstm Mlstm_cell = Rnn. Multirnncell ([Lstm_cell] * layer_num, state_is_tuple=true) # * * Step 5: Initialize state zero = Init_state with full mlstm_cell.zero_state ( Batch_size, Dtype=tf.float32) # * * Step 6: Method One, call DYNAMIC_RNN () to let us build a good network run up # * * when time_major==false, outputs.shape = [ba Tch_size, Timestep_size, hidden_size] # * * So, can take h_state = outputs[:,-1,:] As the last output # * * State.shape = [Layer_num, 2, BA Tch_size, Hidden_size], # * * Or, you can take h_state = state[-1][1] As the final output #* * The final output dimension is [Batch_size, hidden_size] # outputs, state = Tf.nn.dynamic_rnn (Mlstm_cell, Inputs=x, initial_state=init_state , time_major=false) # h_state = outputs[:,-1,:] # or h_state = state[-1][1] # *************** in order to better understand how LSTM works, we put the above
The function in step 6 is implemented by itself *************** # by looking at the document you will find that Rnncell provides a __call__ () function (see last attached), which we can use to expand implementation lstm by Time step iteration. # * * Step 6: Method Two, expand calculation by time step outputs = list () state = Init_state with Tf.variable_scope (' RNN '): For Timestep in range (timestep _size): If Timestep > 0:tf.get_variable_scope (). Reuse_variables () # The state here preserves every layer of LSTM (Cell_output, state) = Mlstm_cell (x[:, Timestep,:], States) Outputs.append (cell_output) h_state = output S[-1]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
The following section is actually the same as the previous tensorflow (c) Multilayer CNNs implementation Mnist classification of the corresponding part is the same.# The output of the LSTM section above will be a [hidden_size] tensor, we want to classify the words, but also need to take a Softmax layer # First define the Softmax connection weight matrix and offset # out_w = Tf.placeholder (tf.f Loat32, [Hidden_size, Class_num], name= ' out_weights ') # Out_bias = Tf.placeholder (Tf.float32, [Class_num], name= ' Out_ Bias ') # Start training and test W = tf. Variable (Tf.truncated_normal ([Hidden_size, Class_num], stddev=0.1), dtype=tf.float32) bias = tf. Variable (Tf.constant (0.1,shape=[class_num]), dtype=tf.float32) Y_pre = Tf.nn.softmax (Tf.matmul (h_state, W) + bias) # loss
and evaluation function cross_entropy =-tf.reduce_mean (Y * tf.log (y_pre)) Train_op = Tf.train.AdamOptimizer (LR). Minimize (Cross_entropy) Correct_prediction = Tf.equal (Tf.argmax (y_pre,1), Tf.argmax (y,1)) accuracy = Tf.reduce_mean (Tf.cast (correct_ Prediction, "float")) Sess.run (Tf.global_variables_initializer ()) for I in range: _batch_size = Batch
= Mnist.train.next_batch (_batch_size) if (i+1)%200 = = 0:train_accuracy = Sess.run (accuracy, feed_dict={ _x:batch[0], y:batch[1], keep_prob:1.0, Batch_size: _batch_size}) # Number of epochs that have been iterated: mnist.train.epochs_completed print "iter%d, step%d, trainin G accuracy%g "% (mnist.train.epochs_completed, (i+1), train_accuracy) Sess.run (Train_op, feed_dict={_x:batch[0], y: Batch[1], keep_prob:0.5, batch_size: _batch_size}) # Calculate the accuracy of test data print "Test accuracy%g"% sess.run (accuracy, feed_dict=
{_x:mnist.test.images, y:mnist.test.labels, keep_prob:1.0, Batch_size:mnist.test.images.shape[0]})
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33Iter0, step, training accuracy 0.851562
Iter0, step, training accuracy 0.960938 Iter1
, step, training Accuracy 0.984375
Iter1, step, training accuracy 0.960938
Iter2, step, training accuracy 0.984375
It Er2, step, training accuracy 0.9375
Iter3, step 1400, training accuracy 0.96875
Iter3, step, training AC Curacy 0.984375
Iter4, step 1800, training accuracy 0.992188
Iter4, step A, training accuracy 0.984375
tes T accuracy 0.9858
1 2 3 4 5 6 7 8 9 10 11 12
1Look underneath I looked for a character 3
Print Mnist.train.labels[4]
1[0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
1 2Let's take a look at this character, and the upper part looks like the second one.
X3 = mnist.train.images[4]
img3 = X3.reshape ([])
plt.imshow (img3, cmap= ' Gray ')
plt.show ()
1 2 3 4Let's take a look at the classification of the time, a line of input, divided into the categories of the probability of what will look like.
X3.shape = [-1, 784]
y_batch = mnist.train.labels[0]
y_batch.shape = [-1, class_num]
x3_outputs = Np.array ( Sess.run (outputs, feed_dict={
_x:x3, Y:y_batch, keep_prob:1.0, batch_size:1))
print X3_outputs.shape
X3_outputs.shape = [Hidden_size]
print X3_outputs.shape
1 2 3 4 5 6 7 8 9(1, +)
(28, 256)
1 2 3H_w = Sess.run (W, feed_dict={
_x:x3, Y:y_batch, keep_prob:1.0, batch_size:1})
H_bias = Sess.run (bias, feed_dict ={
_x:x3, Y:y_batch, keep_prob:1.0, batch_size:1})
h_bias.shape = [-1,]
Bar_index = range (class_num)
for i in Xrange (X3_outputs.shape[0]):
plt.subplot (7, 4, i+1)
x3_h_shate = x3_outputs[i,:].reshape ([-1, Hidden_size])
Pro = Sess.run (Tf.nn.softmax (Tf.matmul (x3_h_shate, h_w) + H_bias))
Plt.bar (Bar_index, pro[0], width=0.2, align= ' center ')
Plt.axis (' off ')
plt.show ()
1 2 3 4 5 6 7 8 9 10 11 12 13 14In the above figure, in order to see more clearly the changes in the line, I have the coordinates are gone, each row shows 4 graphs, a total of 7 lines, representing a row of a row of reading process, the model of the character recognition. As you can see, the model doesn't recognize any characters when you see only the previous lines of pixels, and as you see more and more pixels, it basically determines that it is character 3.
OK, this is the time to be here. Have the opportunity to write another elegant example, haha. In fact, it is difficult to learn this LSTM, when writing multi-layered cnns also half a day to the time is basically no problem, but this took me about a full three or four days, and is in my principle has been very understanding (I think it is only ...) The case, so learned to feel still a little bit happy ~