TensorFlow Introduction (v) multi-level LSTM easy to understand edition

Source: Internet
Author: User
Tags split

Originating From: https://blog.csdn.net/jerr__y/article/details/61195257

Welcome reprint, but please be sure to indicate the source and author information.

@author: Huangyongye
@creat_date: 2017-03-09

According to my own learning TensorFlow realize LSTM experience, found that although there are many tutorials on the internet, many of them are based on the official examples, using multilayer LSTM to achieve Ptbmodel language model, such as:
TensorFlow notes: Multilayer LSTM Code Analysis
But the feeling of these examples is still too complex, so here is a relatively simple version, although not elegant, but it is relatively easy to understand.

If you want to understand the principles of LSTM (provided you understand the principles of common RNN), you can refer to the blog I translated earlier:
Understanding LSTM Network (Understanding LSTM Networks by Colah)

If you want to know the RNN principle, you can refer to the AK blog:
The unreasonable effectiveness of recurrent neural Networks

Many friends mentioned multilayer how to understand, so they made a schematic, hoping to help beginners to better understand multilayer rnn.


Figure 1 3-layer RNN by Time step

This example does not speak the principle. Through this example, you can understand the implementation of single-layer LSTM, multilayer LSTM implementation. The format of the input output data. The implementation of the dropout layer of the RNN.

#-*-Coding:utf-8-*-
import tensorflow as tf
import NumPy as NP from
tensorflow.contrib import rnn
from Tensorflow.examples.tutorials.mnist Import Input_data

# Sets the GPU to grow on Demand
config = tf. Configproto ()
config.gpu_options.allow_growth = True
sess = tf. Session (Config=config)

# First import data, look at the form of data
mnist = input_data.read_data_sets (' Mnist_data ', one_hot=true)
Print Mnist.train.images.shape
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Extracting mnist_data/train-images-idx3-ubyte.gz
extracting mnist_data/train-labels-idx1-ubyte.gz
Extracting mnist_data/t10k-images-idx3-ubyte.gz
extracting mnist_data/t10k-labels-idx1-ubyte.gz
(55000 , 784)
1 2 3 4 5 6 1. First set up the model to use the various super-parameters
LR = 1e-3
# In training and testing, we want to use different batch_size. So take the placeholder way
batch_size = Tf.placeholder (tf.int32)  # Note that the type must be Tf.int32
# After 1.0 versions use:
# keep_prob = Tf.placeholder (Tf.float32, [])
# batch_size = Tf.placeholder (tf.int3 2, [])

# The input characteristics of each moment is 28-dimensional, that is, each time the input line, a row has 28 pixels
input_size = Seven
# sequential duration of 28, that is, every time you make a prediction, you need to enter 28 lines
Timestep_  size =
# Number of nodes per hidden layer
hidden_size = number of
layers of LSTM layers
layer_num = 2 # The
last output classification category number, if it is a regression forecast should be 1
class_num =

_x = Tf.placeholder (Tf.float32, [None, 784])
y = Tf.placeholder (Tf.float32, [None, Class _num])
Keep_prob = Tf.placeholder (Tf.float32)
1 2 3 4 5 6 7 8 9
-Each of the Ten 2. Start building LSTM model, in fact, the common Rnns model is the same
# Restore 784 dots of character information to 28 * 28 Pictures # The following steps are key to achieving rnn/lstm ################################################################## # # # * * Step 1:rnn input shape = (batch_size, timestep_size, input_size) X = Tf.reshape (_x, [-1, 28, 28]) # * * Step 2: Define a layer of lstm_cel L, just need to explain hidden_size, it will automatically match the dimensions of the input X Lstm_cell = rnn. Basiclstmcell (Num_units=hidden_size, forget_bias=1.0, state_is_tuple=true) # * * Step 3: Add dropout layer, generally only set output_keep _prob Lstm_cell = rnn.
Dropoutwrapper (Cell=lstm_cell, input_keep_prob=1.0, Output_keep_prob=keep_prob) # * * Step 4: Call Multirnncell to implement multilayer lstm Mlstm_cell = Rnn. Multirnncell ([Lstm_cell] * layer_num, state_is_tuple=true) # * * Step 5: Initialize state zero = Init_state with full mlstm_cell.zero_state ( Batch_size, Dtype=tf.float32) # * * Step 6: Method One, call DYNAMIC_RNN () to let us build a good network run up # * * when time_major==false, outputs.shape = [ba Tch_size, Timestep_size, hidden_size] # * * So, can take h_state = outputs[:,-1,:] As the last output # * * State.shape = [Layer_num, 2, BA Tch_size, Hidden_size], # * * Or, you can take h_state = state[-1][1] As the final output #* * The final output dimension is [Batch_size, hidden_size] # outputs, state = Tf.nn.dynamic_rnn (Mlstm_cell, Inputs=x, initial_state=init_state , time_major=false) # h_state = outputs[:,-1,:] # or h_state = state[-1][1] # *************** in order to better understand how LSTM works, we put the above
The function in step 6 is implemented by itself *************** # by looking at the document you will find that Rnncell provides a __call__ () function (see last attached), which we can use to expand implementation lstm by Time step iteration. # * * Step 6: Method Two, expand calculation by time step outputs = list () state = Init_state with Tf.variable_scope (' RNN '): For Timestep in range (timestep _size): If Timestep > 0:tf.get_variable_scope (). Reuse_variables () # The state here preserves every layer of LSTM (Cell_output, state) = Mlstm_cell (x[:, Timestep,:], States) Outputs.append (cell_output) h_state = output S[-1]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
3. Set loss function and optimizer, expand training and complete the test The following section is actually the same as the previous tensorflow (c) Multilayer CNNs implementation Mnist classification of the corresponding part is the same.
# The output of the LSTM section above will be a [hidden_size] tensor, we want to classify the words, but also need to take a Softmax layer # First define the Softmax connection weight matrix and offset # out_w = Tf.placeholder (tf.f Loat32, [Hidden_size, Class_num], name= ' out_weights ') # Out_bias = Tf.placeholder (Tf.float32, [Class_num], name= ' Out_ Bias ') # Start training and test W = tf. Variable (Tf.truncated_normal ([Hidden_size, Class_num], stddev=0.1), dtype=tf.float32) bias = tf. Variable (Tf.constant (0.1,shape=[class_num]), dtype=tf.float32) Y_pre = Tf.nn.softmax (Tf.matmul (h_state, W) + bias) # loss

and evaluation function cross_entropy =-tf.reduce_mean (Y * tf.log (y_pre)) Train_op = Tf.train.AdamOptimizer (LR). Minimize (Cross_entropy) Correct_prediction = Tf.equal (Tf.argmax (y_pre,1), Tf.argmax (y,1)) accuracy = Tf.reduce_mean (Tf.cast (correct_ Prediction, "float")) Sess.run (Tf.global_variables_initializer ()) for I in range: _batch_size = Batch
            = Mnist.train.next_batch (_batch_size) if (i+1)%200 = = 0:train_accuracy = Sess.run (accuracy, feed_dict={ _x:batch[0], y:batch[1], keep_prob:1.0, Batch_size: _batch_size}) # Number of epochs that have been iterated: mnist.train.epochs_completed print "iter%d, step%d, trainin  G accuracy%g "% (mnist.train.epochs_completed, (i+1), train_accuracy) Sess.run (Train_op, feed_dict={_x:batch[0], y: Batch[1], keep_prob:0.5, batch_size: _batch_size}) # Calculate the accuracy of test data print "Test accuracy%g"% sess.run (accuracy, feed_dict=
 {_x:mnist.test.images, y:mnist.test.labels, keep_prob:1.0, Batch_size:mnist.test.images.shape[0]})
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
Iter0, step, training accuracy 0.851562
Iter0, step, training accuracy 0.960938 Iter1
, step, training Accuracy 0.984375
Iter1, step, training accuracy 0.960938
Iter2, step, training accuracy 0.984375
It Er2, step, training accuracy 0.9375
Iter3, step 1400, training accuracy 0.96875
Iter3, step, training AC Curacy 0.984375
Iter4, step 1800, training accuracy 0.992188
Iter4, step A, training accuracy 0.984375
tes T accuracy 0.9858
1 2 3 4 5 6 7 8 9 10 11 12

we only iterate less than 5 epoch, in the test set has reached the accuracy of 0.9825, you can see that LSTM in this character classification of the task is still relatively effective, and we finally one-time to the 10000 test images to predict, only accounted for the 725 MiB memory. And we in the previous two-tier CNNs network, the prediction of 10000 pictures of the total use of the 8721 MiB memory, the difference is 12 times times ah. This is mainly because the Rnn/lstm network, each time step used by the weight matrix is shared, can be described in the previous LSTM network structure analysis, the entire network parameters are very few. 4. Visualize and see how LSTM is classified .

After all, LSTM is more about timing-related issues, either text or sequence predictions, so it's hard to see the changes in each layer as intuitively as CNNs. Here, I would like to help you understand how LSTM is going to classify the picture correctly in a step-by-step way by visualizing it.

Import Matplotlib.pyplot as Plt
1

Look underneath I looked for a character 3

Print Mnist.train.labels[4]
1
[0.  0.  0.  1.  0.  0.  0.  0.  0.  0.]
1 2

Let's take a look at this character, and the upper part looks like the second one.

X3 = mnist.train.images[4]
img3 = X3.reshape ([])
plt.imshow (img3, cmap= ' Gray ')
plt.show ()
1 2 3 4

Let's take a look at the classification of the time, a line of input, divided into the categories of the probability of what will look like.

X3.shape = [-1, 784]
y_batch = mnist.train.labels[0]
y_batch.shape = [-1, class_num]

x3_outputs = Np.array ( Sess.run (outputs, feed_dict={
            _x:x3, Y:y_batch, keep_prob:1.0, batch_size:1))
print X3_outputs.shape
X3_outputs.shape = [Hidden_size]
print X3_outputs.shape
1 2 3 4 5 6 7 8 9
(1, +)
(28, 256)
1 2 3
H_w = Sess.run (W, feed_dict={
            _x:x3, Y:y_batch, keep_prob:1.0, batch_size:1})
H_bias = Sess.run (bias, feed_dict ={
            _x:x3, Y:y_batch, keep_prob:1.0, batch_size:1})
h_bias.shape = [-1,]

Bar_index = range (class_num) 
   for i in Xrange (X3_outputs.shape[0]):
    plt.subplot (7, 4, i+1)
    x3_h_shate = x3_outputs[i,:].reshape ([-1, Hidden_size])
    Pro = Sess.run (Tf.nn.softmax (Tf.matmul (x3_h_shate, h_w) + H_bias))
    Plt.bar (Bar_index, pro[0], width=0.2, align= ' center ')
    Plt.axis (' off ')
plt.show ()
1 2 3 4 5 6 7 8 9 10 11 12 13 14

In the above figure, in order to see more clearly the changes in the line, I have the coordinates are gone, each row shows 4 graphs, a total of 7 lines, representing a row of a row of reading process, the model of the character recognition. As you can see, the model doesn't recognize any characters when you see only the previous lines of pixels, and as you see more and more pixels, it basically determines that it is character 3.

OK, this is the time to be here. Have the opportunity to write another elegant example, haha. In fact, it is difficult to learn this LSTM, when writing multi-layered cnns also half a day to the time is basically no problem, but this took me about a full three or four days, and is in my principle has been very understanding (I think it is only ...) The case, so learned to feel still a little bit happy ~

17-04-19 complements several materials:  
- recurrent_network.py  A simple example of TensorFlow LSTM.  
- tensorflow building lstm Model for serialization labeling   Introduction very good one NLP open source project. (Some of the functions in the example may have been updated in the new version of TensorFlow, but do not affect understanding)
5. Attached: basiclstm.__call__ ()

"' Code:https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/rnn/python/ops/core_rnn_cell_
      impl.py "Def __call__ (self, inputs, state, Scope=none):" "" Long Short-term Memory cell (LSTM). ""  With Vs.variable_scope (scope or "Basic_lstm_cell"): # Parameters of Gates is concatenated into one multiply for
          Efficiency. If self._state_is_tuple:c, h = state Else:c, h = array_ops.split (Value=state, Num_ or_size_splits=2, Axis=1) concat = _linear ([inputs, H], 4 * self._num_units, True, Scope=scope) # * *  The following four tensor, respectively, are four gate corresponding weights matrix # i = input_gate, j = new_input, F = forget_gate, o = output_gate I, J, F, o = Array_ops.split (Value=concat, num_or_size_splits=4, Axis=1) # * * Update cell Status: # * * c * SIGMO
          ID (f + self._forget_bias) is part of the old information that holds the previous Timestep # * * * sigmoid (i) * Self._activation (j) is the new information brought by current Timestep New_c = (c *Sigmoid (f + self._forget_bias) + sigmoid (i) * Self._activation (j)) # * * New Output New_h = s Elf._activation (New_c) * sigmoid (o) If self._state_is_tuple:new_state = Lstmstatetuple (New_c, NE W_h) else:new_state = Array_ops.concat ([New_c, New_h], 1) # * * in (generally) state_is_tuple=t Rue case, New_h=new_state[1] # * * In the above blog, there is Cell_output = state[1] return new_h, new_state
1 2 3 4 5 6 7 8 9

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.