Learning notes TF036: Implement Bidirectional LSTM Classifier and tf036bidirectional

Last Update:2017-08-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Bidirectional Recurrent Neural Networks, Bi-RNN, Schuster, Paliwal, first proposed in 1997, the same year as LSTM. Bi-RNN to add available information of RNN. Regular MLP with limited data length. RNN can process time series data of an unfixed length and cannot use historical input for future information. Bi-RNN uses time series data to input historical and future data. The time series is opposite to that of two recurrent neural networks connected to the same output, and the output layer can obtain historical and future information at the same time.

Language Modeling is not suitable for Bi-RNN. It aims to predict the next word in the previous article and cannot pass the following information to the model. Classification problems, Handwritten Character Recognition, machine translation, protein structure prediction, Bi-RNN to improve the model effect. Baidu Speech Recognition uses Bi-RNN to improve the accuracy of the model based on comprehensive context.

The core of the Bi-RNN network structure. The common unidirectional RNN is split into two directions, which are forward and reverse at any time. The current time node outputs data in both forward and reverse directions. Two RNN nodes in different directions do not share the state. The forward RNN outputs the state only to the forward RNN, the reverse RNN outputs the state only to the reverse RNN, and the reverse RNN does not have a direct connection. Input at each time node is sent to the forward and reverse RNN, and output is generated based on their respective states. The two outputs are connected to the Bi-RNN output node together to form the final output. Contribution (or loss) to the current time node output is calculated during training and the parameter is optimized to an appropriate value based on the gradient.

For Bi-RNN training, there is no intersection between positive and reverse RNN, and the common feed-forward network is expanded separately. BPTT (back-propagation through time) algorithm training, unable to update the status and output at the same time. The forward state is unknown at t = 1, the reverse state is unknown at t = T, and the state is unknown at the beginning of the forward and reverse state. Manual setting is required. The forward state derivative is unknown at t = T, the reverse state derivative is unknown at t = 1, and the state derivative is unknown at the end of the positive and reverse crystallization. 0 is required to indicate that parameter update is not important.

Start training. Step 1: input data forward pass operation and inference operation. Calculate forward RNN state in the direction of 1-> T and reverse RNN state in the direction of T-> 1, obtain the output. Step 2: backward pass operation, target function export operation, export output first, calculate forward RNN state derivative along the T-> 1 direction, calculate the reverse RNN state derivative in the 1-> T direction. Step 3: update the model parameters based on the obtained gradient values to complete the training.

Each RNN unit of Bi-RNN can be a traditional RNN, or an LSTM or GRU unit. You can add a layer of Bi-RNN on a layer of Bi-RNN, and the upper layer of Bi-RNN output is used as the lower layer of Bi-RNN input, which further abstracts and extracts features. For classification tasks, the Bi-RNN output sequence is connected to the full connection layer, or the Global Average pool is connected to the Global Average Pooling, and then to the Softmax layer, which is the same as the convolution network.

TensorFlow implements Bidirectional LSTM Classifier and is tested in the MNIST dataset. The built-in MNIST data reader for loading TensorFlow, NumPy, and TensorFlow. Input_data.read_data_sets download and read the MNIST dataset.

Set training parameters. Set the learning rate to 0.01, and select Adam as the optimizer. The learning rate is low. The maximum number of training samples is 0.4 million, and the batch_size is 128. You can set the number of training samples to be displayed every 10 times.

MNIST image size 28x28, input n_input 28 (image width), n_steps LSTM expand step (unrolled steps of LSTM), set 28 (Image Height), and use all image information. One row of pixels (28 pixels) is read at a time, and the next row of pixels is input at the next time point. Set n_hidden (number of hidden nodes in LSTM) to 256, and n_classes (number of MNIST data sets) to 10.

Create place_holder for input x and learning target y. Input x each sample uses a two-dimensional structure. The sample is a time series. The first dimension is n_steps, and the second dimension is n_input for each time point. Set the Softmax layer weights and biases, tf. random_normal initialization parameters. Two-way LSTM, forward, and backward LSTM cells. The number of weights parameters doubles by 2 * n_hidden.

Define the Bidirectional LSTM network generation function. SHAPE (batch_size, n_steps, n_input) input variable length n_steps list, element shape (batch_size, n_input ). Input transpose, tf. transpose (x, [, 2]), first dimension batch_size, second dimension n_steps, switch. Tf. reshape, input x to change (n_steps * batch_size, n_input) shape. Tf. split, x split into n_steps list, list each tensor size (batch_size, n_input), conforms to the LSTM unit input format. Tf. contrib. rnn. BasicLSTMCell, create the forward and backward LSTM units, and set the number of hidden nodes to n_hidden and forget_bias to 1. Forward l1__fw_cell and reverse l1__bw_cell to input Bi-RNN interface tf. nn. bidirectional_rnn, generate bidirectional LSTM, and input x. Bidirectional LSTM output results are output for matrix multiplication and bias. The parameters are weights and biases defined earlier.

Finally, tf. nn. softmax_cross_entropy_with_logits and Softmax process the computing loss. Tf. performance_mean calculates the average cost. The optimizer Adam, learning_rate. Tf. argmax gets the Model Prediction type, and tf. equal determines whether the prediction is correct. Tf. performance_mean is used to calculate the average accuracy.

Perform training and testing. Run the initialization parameter to define a training cycle and keep the total number of training samples (number of iterations * batch_size) smaller than the set value. During each training iteration, mnist. train. next_batch obtains a batch data, and the reshape changes the shape. Feed_dict that includes input x and training target y is passed in. The training operation is performed to update model parameters. The number of iterations is an integer multiple of display_step to calculate the current batch data prediction accuracy, loss, and display.

All training iteration results, trained model, mnist. test. images all test data prediction, display accuracy.

After completing 0.4 million sample training, the prediction accuracy of the training set is basically 0.983 accuracy of the sample test set.

Bidirectional LSTM Classifier, MNIST dataset is not as good as convolutional neural network. Bi-RNN and bidirectional LSTM networks provide better performance for Time Series Classification tasks. At the same time, the time series history and future information are used in combination with context information for Comprehensive Judgment of results.

    import tensorflow as tf    import numpy as np    # Import MINST data    from tensorflow.examples.tutorials.mnist import input_data    mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)    # Parameters    learning_rate = 0.01    max_samples = 400000    batch_size = 128    display_step = 10    # Network Parameters    n_input = 28 # MNIST data input (img shape: 28*28)    n_steps = 28 # timesteps    n_hidden = 256 # hidden layer num of features    n_classes = 10 # MNIST total classes (0-9 digits)    # tf Graph input    x = tf.placeholder("float", [None, n_steps, n_input])    y = tf.placeholder("float", [None, n_classes])    # Define weights    weights = {        # Hidden layer weights => 2*n_hidden because of foward + backward cells        'out': tf.Variable(tf.random_normal([2*n_hidden, n_classes]))    }    biases = {        'out': tf.Variable(tf.random_normal([n_classes]))    }    def BiRNN(x, weights, biases):        # Prepare data shape to match `bidirectional_rnn` function requirements        # Current data input shape: (batch_size, n_steps, n_input)        # Required shape: 'n_steps' tensors list of shape (batch_size, n_input)            # Permuting batch_size and n_steps        x = tf.transpose(x, [1, 0, 2])        # Reshape to (n_steps*batch_size, n_input)        x = tf.reshape(x, [-1, n_input])        # Split to get a list of 'n_steps' tensors of shape (batch_size, n_input)        x = tf.split(x, n_steps)        # Define lstm cells with tensorflow        # Forward direction cell        lstm_fw_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)        # Backward direction cell        lstm_bw_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)        # Get lstm cell output    #    try:        outputs, _, _ = tf.contrib.rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x,                                           dtype=tf.float32)    #    except Exception: # Old TensorFlow version only returns outputs not states    #        outputs = rnn.bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x,    #                                        dtype=tf.float32)        # Linear activation, using rnn inner loop last output        return tf.matmul(outputs[-1], weights['out']) + biases['out']        pred = BiRNN(x, weights, biases)    # Define loss and optimizer    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)    # Evaluate model    correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))    # Initializing the variables    init = tf.global_variables_initializer()    # Launch the graph    with tf.Session() as sess:        sess.run(init)        step = 1        # Keep training until reach max iterations        while step * batch_size < max_samples:            batch_x, batch_y = mnist.train.next_batch(batch_size)            # Reshape data to get 28 seq of 28 elements            batch_x = batch_x.reshape((batch_size, n_steps, n_input))            # Run optimization op (backprop)            sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})            if step % display_step == 0:                # Calculate batch accuracy                acc = sess.run(accuracy, feed_dict={x: batch_x, y: batch_y})                # Calculate batch loss                loss = sess.run(cost, feed_dict={x: batch_x, y: batch_y})                print("Iter " + str(step*batch_size) + ", Minibatch Loss= " + \                      "{:.6f}".format(loss) + ", Training Accuracy= " + \                      "{:.5f}".format(acc))            step += 1        print("Optimization Finished!")        # Calculate accuracy for 128 mnist test images        test_len = 10000        test_data = mnist.test.images[:test_len].reshape((-1, n_steps, n_input))        test_label = mnist.test.labels[:test_len]        print("Testing Accuracy:", \            sess.run(accuracy, feed_dict={x: test_data, y: test_label}))

References:
TensorFlow practice

Welcome to paid consultation (150 RMB per hour), My: qingxingfengzi

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Learning notes TF036: Implement Bidirectional LSTM Classifier and tf036bidirectional

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Learning notes TF036: Implement Bidirectional LSTM Classifier and tf036bidirectional

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support