Sesame HTTP: TensorFlow lstm mnist classification, tensorflowlstm

Last Update:2018-02-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This section describes how to use LSTM of RNN for MNIST classification. RNN may be slower than CNN but can save more memory space.

Initialization

First, we can initialize some variables, such as the learning rate, number of node units, and RNN layers:

learning_rate = 1e-3num_units = 256num_layer = 3input_size = 28time_step = 28total_steps = 2000category_num = 10steps_per_validate = 100steps_per_test = 500batch_size = tf.placeholder(tf.int32, [])keep_prob = tf.placeholder(tf.float32, [])

Then you need to declare the MNIST data generator:

import tensorflow as tffrom tensorflow.examples.tutorials.mnist import input_datamnist = input_data.read_data_sets('MNIST_data/', one_hot=True)

Next, we will declare the input data. The input data is represented by x, and the labeled data is represented by y_label:

x = tf.placeholder(tf.float32, [None, 784])y_label = tf.placeholder(tf.float32, [None, 10])

The x dimension is [None, 784], indicating that the batch_size is not sure. The input dimension is 784, And the y_label is the same.

Next, we need to reshape the input x, because we need to divide a graph into multiple time_step input to construct an RNN sequence, so we can set time_step to 28, in this way, input_size is changed to 28, and batch_size remains unchanged. Therefore, the reshape result is a three-dimensional matrix:

x_shape = tf.reshape(x, [-1, time_step, input_size])

RNN Layer

Next, we need to build an RNN model. Here we use the RNN Cell LSTMCell, and we need to build a layer-3 RNN, so we need to use MultiRNNCell here, its input parameter is the list of LSTMCell.

Therefore, we can declare a method to create LSTMCell as follows:

def cell(num_units):    cell = tf.nn.rnn_cell.BasicLSTMCell(num_units=num_units)    return DropoutWrapper(cell, output_keep_prob=keep_prob)

Dropout is added to reduce overfitting during training.

Next we will use it to build a multi-layer RNN:

cells = tf.nn.rnn_cell.MultiRNNCell([cell(num_units) for _ in range(num_layer)])

Note that the for loop is used here. Each loop generates an LSTMCell, instead of simply extending the list using multiplication, because LSTMCell is the same object, this causes the dimension mismatch problem after the MultiRNNCell is built.

Next, we need to declare an initial state:

h0 = cells.zero_state(batch_size, dtype=tf.float32)

Next, call the dynamic_rnn () method to build the model:

output, hs = tf.nn.dynamic_rnn(cells, inputs=x_shape, initial_state=h0)

Here, the input of inputs is the result after x is reshaped. The initial state is passed in through initial_state. There are two returned results. One output is the output result of all time_step and the value is output, it is three-dimensional. The length of the first dimension is equal to batch_size, the length of the second dimension is equal to time_step, and the length of the third dimension is equal to num_units. The other hs is an implicit state, in the form of tuples. The length is RNN Layer 3. Each element contains two implicit states: c and h, that is, LSTM.

In this case, the final output result can be the result of the last time_step, so you can use:

output = output[:, -1, :]

Or the h at the last layer of the hidden state is the same:

h = hs[-1].h

In this model, the two are equivalent. However, note that if it is used for text processing, the two may be different due to different text lengths and padding.

Output Layer

Next we will perform a linear transformation and Softmax output:

# Output Layerw = tf.Variable(tf.truncated_normal([num_units, category_num], stddev=0.1), dtype=tf.float32)b = tf.Variable(tf.constant(0.1, shape=[category_num]), dtype=tf.float32)y = tf.matmul(output, w) + b# Losscross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=y_label, logits=y)

Here, Loss directly calls softmax_cross_entropy_with_logits to calculate Softmax and then calculate the cross entropy.

Training and Evaluation

Finally, you can define the training and evaluation processes. During the training process, Train Accuracy and Test Accuracy are output at every certain step:

# Traintrain = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cross_entropy)# Predictioncorrection_prediction = tf.equal(tf.argmax(y, axis=1), tf.argmax(y_label, axis=1))accuracy = tf.reduce_mean(tf.cast(correction_prediction, tf.float32))# Trainwith tf.Session() as sess:    sess.run(tf.global_variables_initializer())    for step in range(total_steps + 1):        batch_x, batch_y = mnist.train.next_batch(100)        sess.run(train, feed_dict={x: batch_x, y_label: batch_y, keep_prob: 0.5, batch_size: batch_x.shape[0]})        # Train Accuracy        if step % steps_per_validate == 0:            print('Train', step, sess.run(accuracy, feed_dict={x: batch_x, y_label: batch_y, keep_prob: 0.5,                                                               batch_size: batch_x.shape[0]}))        # Test Accuracy        if step % steps_per_test == 0:            test_x, test_y = mnist.test.images, mnist.test.labels            print('Test', step,                  sess.run(accuracy, feed_dict={x: test_x, y_label: test_y, keep_prob: 1, batch_size: test_x.shape[0]}))

Run

After running the program directly, you can achieve 98% accuracy after training for several rounds:

Train 0 0.27Test 0 0.2223Train 100 0.87Train 200 0.91Train 300 0.94Train 400 0.94Train 500 0.99Test 500 0.9595Train 600 0.95Train 700 0.97Train 800 0.98

It can be seen that LSTM is effective in the task of MNIST character classification.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Sesame HTTP: TensorFlow lstm mnist classification, tensorflowlstm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Sesame HTTP: TensorFlow lstm mnist classification, tensorflowlstm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support