Learning notes TF020: Sequence Annotation, handwritten lowercase letter OCR dataset, bidirectional RNN, tf020rnn

Source: Internet
Author: User

Learning notes TF020: Sequence Annotation, handwritten lowercase letter OCR dataset, bidirectional RNN, tf020rnn

Sequence labeling, which predicts a category for each frame of the input sequence. OCR (Optical Character Recognition ).

MIT oral Systems Research Group Rob Kassel collection, Stanford University Artificial Intelligence Laboratory Ben Taskar pre-processing OCR dataset (http://ai.stanford.edu /~ Btaskar/ocr/), contains a large number of handwritten lowercase letters, each sample corresponds to a 16x8 pixel binary image. Character Line combination sequence, which corresponds to words. 6800 words with a length of no more than 14 letters. Gzip compression. The content is separated by tabs. The Python csv module reads data directly. Each row of the file has a normalized letter attribute, such as the ID number, Tag, pixel value, and ID of the next letter.

Sort the ID values of the next letter and read each letter in the correct order. Collect letters until the corresponding field of the next ID is not set. Read the new sequence. After reading the target letters and data pixels, the system fills the sequence objects with zero images and can include a NumPy array of all the pixel data of two large target letters.

Time steps share the softmax layer. The data and target arrays contain a sequence. Each target letter corresponds to an image frame. RNN extension. A softmax classifier is added for each letter output. The classifier evaluates the prediction results for each frame of data instead of the entire sequence. Calculates the sequence length. Add a softmax layer to all frames: add several different classifiers for all frames, or share all frames with the same classifier. Shared classifier. The weight value is adjusted more times during training, and each word is represented by a letter. Batch_size * in_size * out_size of the weight matrix of a full connection layer. You need to update the weight matrix in two input dimensions: batch_size and sequence_steps. Flat the input (RNN Output Active Value) to the batch_size * sequence_steps * in_size. The weight matrix is converted into a large batch of data. The result is unflatten ).

The cost function. Each frame of the sequence has a prediction target pair, which is averaged in the corresponding dimension. Tf. performance_mean normalized Based on tensor length (maximum sequence length) cannot be used. You need to normalize the sequence length by manually calling tf. performance_sum and the mean of division.

Loss function. tf. argmax calculates the mean value based on the actual length of the sequence for non-Axis 1 of Axis 2 and fills each frame. Tf. performance_mean takes the mean value for all words in the batch data.

TensorFlow automatically performs derivative computation. You can use the same sequence classification optimization computation. You only need to use the new cost function. Crop all RNN gradients to prevent training divergence and avoid negative effects.

Training Model: get_sataset downloads handwritten images, pre-processing, and lowercase letters with unique hot encoding vectors. Randomly disrupt the Data Order and divide the training set and test set in a biased manner.

The adjacent letters of a word are dependent (or mutual information). RNN stores all input information of the same word to the hidden active value. The first few letters are classified, and the network does not have a large amount of input to infer additional information. bidirectional RNN is used to overcome the defects.
Two RNN observation input sequences, one reading words from the left side in the general order, and the other reading words from the right side in the reverse order. Each time step obtains two output active values. Before sending data to the shared softmax layer, splice the data. The classifier obtains complete word information from each letter. Tf. modle. rnn. bidirectional_rnn has been implemented.

Implement bidirectional RNN. Divide the prediction attributes into two functions with only a small amount of attention. _ Shared_softmax function. input the function tensor data to infer the input size. Reuse other architecture functions to share the same softmax layer in all time steps. Rnn. dynamic_rnn creates two RNN instances.
Sequence inversion is easier than implementing the new RNN operation. The tf. reverse_sequence function reverses the sequence_lengths frame in the frame data. The data flow chart node has a name. The scope parameter is the scope name of the rnn_dynamic_cell variable. The default value is RNN. The two parameters are different from those of RNN, and different fields are required.
The reverse sequence is sent to the backward RNN, and the network output is reversed, which is aligned with the forward output. Concatenates two tensor values along the output dimension of the RNN neuron and returns the result. The bidirectional RNN model has better performance.

    import gzip    import csv    import numpy as np    from helpers import download    class OcrDataset:        URL = 'http://ai.stanford.edu/~btaskar/ocr/letter.data.gz'        def __init__(self, cache_dir):            path = download(type(self).URL, cache_dir)            lines = self._read(path)            data, target = self._parse(lines)            self.data, self.target = self._pad(data, target)        @staticmethod        def _read(filepath):            with gzip.open(filepath, 'rt') as file_:                reader = csv.reader(file_, delimiter='\t')                lines = list(reader)                return lines        @staticmethod        def _parse(lines):            lines = sorted(lines, key=lambda x: int(x[0]))            data, target = [], []            next_ = None            for line in lines:                if not next_:                    data.append([])                    target.append([])                else:                    assert next_ == int(line[0])                next_ = int(line[2]) if int(line[2]) > -1 else None                pixels = np.array([int(x) for x in line[6:134]])                pixels = pixels.reshape((16, 8))                data[-1].append(pixels)                target[-1].append(line[1])            return data, target        @staticmethod        def _pad(data, target):            max_length = max(len(x) for x in target)            padding = np.zeros((16, 8))            data = [x + ([padding] * (max_length - len(x))) for x in data]            target = [x + ([''] * (max_length - len(x))) for x in target]            return np.array(data), np.array(target)    import tensorflow as tf    from helpers import lazy_property    class SequenceLabellingModel:        def __init__(self, data, target, params):            self.data = data            self.target = target            self.params = params            self.prediction            self.cost            self.error            self.optimize        @lazy_property        def length(self):            used = tf.sign(tf.reduce_max(tf.abs(self.data), reduction_indices=2))            length = tf.reduce_sum(used, reduction_indices=1)            length = tf.cast(length, tf.int32)            return length        @lazy_property        def prediction(self):            output, _ = tf.nn.dynamic_rnn(                tf.nn.rnn_cell.GRUCell(self.params.rnn_hidden),                self.data,                dtype=tf.float32,                sequence_length=self.length,            )            # Softmax layer.            max_length = int(self.target.get_shape()[1])            num_classes = int(self.target.get_shape()[2])            weight = tf.Variable(tf.truncated_normal(                [self.params.rnn_hidden, num_classes], stddev=0.01))            bias = tf.Variable(tf.constant(0.1, shape=[num_classes]))            # Flatten to apply same weights to all time steps.            output = tf.reshape(output, [-1, self.params.rnn_hidden])            prediction = tf.nn.softmax(tf.matmul(output, weight) + bias)            prediction = tf.reshape(prediction, [-1, max_length, num_classes])            return prediction        @lazy_property        def cost(self):            # Compute cross entropy for each frame.            cross_entropy = self.target * tf.log(self.prediction)            cross_entropy = -tf.reduce_sum(cross_entropy, reduction_indices=2)            mask = tf.sign(tf.reduce_max(tf.abs(self.target), reduction_indices=2))            cross_entropy *= mask            # Average over actual sequence lengths.            cross_entropy = tf.reduce_sum(cross_entropy, reduction_indices=1)            cross_entropy /= tf.cast(self.length, tf.float32)            return tf.reduce_mean(cross_entropy)        @lazy_property        def error(self):            mistakes = tf.not_equal(                tf.argmax(self.target, 2), tf.argmax(self.prediction, 2))            mistakes = tf.cast(mistakes, tf.float32)            mask = tf.sign(tf.reduce_max(tf.abs(self.target), reduction_indices=2))            mistakes *= mask            # Average over actual sequence lengths.            mistakes = tf.reduce_sum(mistakes, reduction_indices=1)            mistakes /= tf.cast(self.length, tf.float32)            return tf.reduce_mean(mistakes)        @lazy_property        def optimize(self):            gradient = self.params.optimizer.compute_gradients(self.cost)            try:                limit = self.params.gradient_clipping                gradient = [                    (tf.clip_by_value(g, -limit, limit), v)                    if g is not None else (None, v)                    for g, v in gradient]            except AttributeError:                print('No gradient clipping parameter specified.')            optimize = self.params.optimizer.apply_gradients(gradient)            return optimize    import random    import tensorflow as tf    import numpy as np    from helpers import AttrDict    from OcrDataset import OcrDataset    from SequenceLabellingModel import SequenceLabellingModel    from batched import batched    params = AttrDict(        rnn_cell=tf.nn.rnn_cell.GRUCell,        rnn_hidden=300,        optimizer=tf.train.RMSPropOptimizer(0.002),        gradient_clipping=5,        batch_size=10,        epochs=5,        epoch_size=50    )    def get_dataset():        dataset = OcrDataset('./ocr')        # Flatten images into vectors.        dataset.data = dataset.data.reshape(dataset.data.shape[:2] + (-1,))        # One-hot encode targets.        target = np.zeros(dataset.target.shape + (26,))        for index, letter in np.ndenumerate(dataset.target):            if letter:                target[index][ord(letter) - ord('a')] = 1        dataset.target = target        # Shuffle order of examples.        order = np.random.permutation(len(dataset.data))        dataset.data = dataset.data[order]        dataset.target = dataset.target[order]        return dataset    # Split into training and test data.    dataset = get_dataset()    split = int(0.66 * len(dataset.data))    train_data, test_data = dataset.data[:split], dataset.data[split:]    train_target, test_target = dataset.target[:split], dataset.target[split:]    # Compute graph.    _, length, image_size = train_data.shape    num_classes = train_target.shape[2]    data = tf.placeholder(tf.float32, [None, length, image_size])    target = tf.placeholder(tf.float32, [None, length, num_classes])    model = SequenceLabellingModel(data, target, params)    batches = batched(train_data, train_target, params.batch_size)    sess = tf.Session()    sess.run(tf.initialize_all_variables())    for index, batch in enumerate(batches):        batch_data = batch[0]        batch_target = batch[1]        epoch = batch[2]        if epoch >= params.epochs:            break        feed = {data: batch_data, target: batch_target}        error, _ = sess.run([model.error, model.optimize], feed)        print('{}: {:3.6f}%'.format(index + 1, 100 * error))    test_feed = {data: test_data, target: test_target}    test_error, _ = sess.run([model.error, model.optimize], test_feed)    print('Test error: {:3.6f}%'.format(100 * error))    import tensorflow as tf    from helpers import lazy_property    class BidirectionalSequenceLabellingModel:        def __init__(self, data, target, params):            self.data = data            self.target = target            self.params = params            self.prediction            self.cost            self.error            self.optimize        @lazy_property        def length(self):            used = tf.sign(tf.reduce_max(tf.abs(self.data), reduction_indices=2))            length = tf.reduce_sum(used, reduction_indices=1)            length = tf.cast(length, tf.int32)            return length        @lazy_property        def prediction(self):            output = self._bidirectional_rnn(self.data, self.length)            num_classes = int(self.target.get_shape()[2])            prediction = self._shared_softmax(output, num_classes)            return prediction        def _bidirectional_rnn(self, data, length):            length_64 = tf.cast(length, tf.int64)            forward, _ = tf.nn.dynamic_rnn(                cell=self.params.rnn_cell(self.params.rnn_hidden),                inputs=data,                dtype=tf.float32,                sequence_length=length,                scope='rnn-forward')            backward, _ = tf.nn.dynamic_rnn(            cell=self.params.rnn_cell(self.params.rnn_hidden),            inputs=tf.reverse_sequence(data, length_64, seq_dim=1),            dtype=tf.float32,            sequence_length=self.length,            scope='rnn-backward')            backward = tf.reverse_sequence(backward, length_64, seq_dim=1)            output = tf.concat(2, [forward, backward])            return output        def _shared_softmax(self, data, out_size):            max_length = int(data.get_shape()[1])            in_size = int(data.get_shape()[2])            weight = tf.Variable(tf.truncated_normal(                [in_size, out_size], stddev=0.01))            bias = tf.Variable(tf.constant(0.1, shape=[out_size]))            # Flatten to apply same weights to all time steps.            flat = tf.reshape(data, [-1, in_size])            output = tf.nn.softmax(tf.matmul(flat, weight) + bias)            output = tf.reshape(output, [-1, max_length, out_size])            return output        @lazy_property        def cost(self):            # Compute cross entropy for each frame.            cross_entropy = self.target * tf.log(self.prediction)            cross_entropy = -tf.reduce_sum(cross_entropy, reduction_indices=2)            mask = tf.sign(tf.reduce_max(tf.abs(self.target), reduction_indices=2))            cross_entropy *= mask            # Average over actual sequence lengths.            cross_entropy = tf.reduce_sum(cross_entropy, reduction_indices=1)            cross_entropy /= tf.cast(self.length, tf.float32)            return tf.reduce_mean(cross_entropy)        @lazy_property        def error(self):            mistakes = tf.not_equal(                tf.argmax(self.target, 2), tf.argmax(self.prediction, 2))            mistakes = tf.cast(mistakes, tf.float32)            mask = tf.sign(tf.reduce_max(tf.abs(self.target), reduction_indices=2))            mistakes *= mask            # Average over actual sequence lengths.            mistakes = tf.reduce_sum(mistakes, reduction_indices=1)            mistakes /= tf.cast(self.length, tf.float32)            return tf.reduce_mean(mistakes)        @lazy_property        def optimize(self):            gradient = self.params.optimizer.compute_gradients(self.cost)            try:                limit = self.params.gradient_clipping                gradient = [                    (tf.clip_by_value(g, -limit, limit), v)                    if g is not None else (None, v)                    for g, v in gradient]            except AttributeError:                print('No gradient clipping parameter specified.')            optimize = self.params.optimizer.apply_gradients(gradient)            return optimize

 

References:
TensorFlow practices for Machine Intelligence

Welcome to join me: qingxingfengzi
My public account: qingxingfengzigz
My wife Zhang Xingqing's Public Account: qingqingfeifangz

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.