Sesame HTTP: TensorFlow lstm mnist classification, tensorflowlstm

Source: Internet
Author: User

Sesame HTTP: TensorFlow lstm mnist classification, tensorflowlstm

This section describes how to use LSTM of RNN for MNIST classification. RNN may be slower than CNN but can save more memory space.

Initialization

First, we can initialize some variables, such as the learning rate, number of node units, and RNN layers:

learning_rate = 1e-3num_units = 256num_layer = 3input_size = 28time_step = 28total_steps = 2000category_num = 10steps_per_validate = 100steps_per_test = 500batch_size = tf.placeholder(tf.int32, [])keep_prob = tf.placeholder(tf.float32, [])

Then you need to declare the MNIST data generator:

import tensorflow as tffrom tensorflow.examples.tutorials.mnist import input_datamnist = input_data.read_data_sets('MNIST_data/', one_hot=True)

Next, we will declare the input data. The input data is represented by x, and the labeled data is represented by y_label:

x = tf.placeholder(tf.float32, [None, 784])y_label = tf.placeholder(tf.float32, [None, 10])

The x dimension is [None, 784], indicating that the batch_size is not sure. The input dimension is 784, And the y_label is the same.

Next, we need to reshape the input x, because we need to divide a graph into multiple time_step input to construct an RNN sequence, so we can set time_step to 28, in this way, input_size is changed to 28, and batch_size remains unchanged. Therefore, the reshape result is a three-dimensional matrix:

x_shape = tf.reshape(x, [-1, time_step, input_size])
RNN Layer

Next, we need to build an RNN model. Here we use the RNN Cell LSTMCell, and we need to build a layer-3 RNN, so we need to use MultiRNNCell here, its input parameter is the list of LSTMCell.

Therefore, we can declare a method to create LSTMCell as follows:

def cell(num_units):    cell = tf.nn.rnn_cell.BasicLSTMCell(num_units=num_units)    return DropoutWrapper(cell, output_keep_prob=keep_prob)

Dropout is added to reduce overfitting during training.

Next we will use it to build a multi-layer RNN:

cells = tf.nn.rnn_cell.MultiRNNCell([cell(num_units) for _ in range(num_layer)])

Note that the for loop is used here. Each loop generates an LSTMCell, instead of simply extending the list using multiplication, because LSTMCell is the same object, this causes the dimension mismatch problem after the MultiRNNCell is built.

Next, we need to declare an initial state:

h0 = cells.zero_state(batch_size, dtype=tf.float32)

Next, call the dynamic_rnn () method to build the model:

output, hs = tf.nn.dynamic_rnn(cells, inputs=x_shape, initial_state=h0)

Here, the input of inputs is the result after x is reshaped. The initial state is passed in through initial_state. There are two returned results. One output is the output result of all time_step and the value is output, it is three-dimensional. The length of the first dimension is equal to batch_size, the length of the second dimension is equal to time_step, and the length of the third dimension is equal to num_units. The other hs is an implicit state, in the form of tuples. The length is RNN Layer 3. Each element contains two implicit states: c and h, that is, LSTM.

In this case, the final output result can be the result of the last time_step, so you can use:

output = output[:, -1, :]

Or the h at the last layer of the hidden state is the same:

h = hs[-1].h

In this model, the two are equivalent. However, note that if it is used for text processing, the two may be different due to different text lengths and padding.

Output Layer

Next we will perform a linear transformation and Softmax output:

# Output Layerw = tf.Variable(tf.truncated_normal([num_units, category_num], stddev=0.1), dtype=tf.float32)b = tf.Variable(tf.constant(0.1, shape=[category_num]), dtype=tf.float32)y = tf.matmul(output, w) + b# Losscross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=y_label, logits=y)

Here, Loss directly calls softmax_cross_entropy_with_logits to calculate Softmax and then calculate the cross entropy.

Training and Evaluation

Finally, you can define the training and evaluation processes. During the training process, Train Accuracy and Test Accuracy are output at every certain step:

# Traintrain = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cross_entropy)# Predictioncorrection_prediction = tf.equal(tf.argmax(y, axis=1), tf.argmax(y_label, axis=1))accuracy = tf.reduce_mean(tf.cast(correction_prediction, tf.float32))# Trainwith tf.Session() as sess:    sess.run(tf.global_variables_initializer())    for step in range(total_steps + 1):        batch_x, batch_y = mnist.train.next_batch(100)        sess.run(train, feed_dict={x: batch_x, y_label: batch_y, keep_prob: 0.5, batch_size: batch_x.shape[0]})        # Train Accuracy        if step % steps_per_validate == 0:            print('Train', step, sess.run(accuracy, feed_dict={x: batch_x, y_label: batch_y, keep_prob: 0.5,                                                               batch_size: batch_x.shape[0]}))        # Test Accuracy        if step % steps_per_test == 0:            test_x, test_y = mnist.test.images, mnist.test.labels            print('Test', step,                  sess.run(accuracy, feed_dict={x: test_x, y_label: test_y, keep_prob: 1, batch_size: test_x.shape[0]}))
Run

After running the program directly, you can achieve 98% accuracy after training for several rounds:

Train 0 0.27Test 0 0.2223Train 100 0.87Train 200 0.91Train 300 0.94Train 400 0.94Train 500 0.99Test 500 0.9595Train 600 0.95Train 700 0.97Train 800 0.98

It can be seen that LSTM is effective in the task of MNIST character classification.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.