Sequence callout (sequence labelling), the input sequence predicts a category for each frame. OCR (Optical Character recognition Optical character recognition).
MIT oral biomicrofluidics Rob Kassel collection, Stanford University AI lab Ben Taskar preprocessing OCR DataSet (http://ai.stanford.edu/~btaskar/ocr/), contains a large number of individual handwritten lowercase letters, Each sample corresponds to a 16x8 pixel binary image. Word line combination sequence, the sequence corresponds to the word. 6,800 words with a length of not more than 14 letters. gzip Compressed, Content tab-delimited text file. The Python CSV module is read directly. Each line of the file has a normalized letter attribute, ID number, label, pixel value, next letter ID number, and so on.
The next letter ID value is sorted to read each word letter in the correct order. Collect letters until the next ID corresponding field is not set. Reads a new sequence. After reading the target letters and data pixels, filling the sequence object with 0 images, can include two large target letters all pixel data numpy array.
The Softmax layer is shared between time steps. The data and destination arrays contain sequences, with each target letter corresponding to an image frame. RNN extension, each letter output is added Softmax classifier. The classifier evaluates the predicted results for each frame of data rather than the entire sequence. Calculates the sequence length. A softmax layer is added to all frames: either add several different classifiers for all frames, or make all frames share the same classifier. Shared classifiers, weights are adjusted more often in training, and each letter of the word is trained. The dimension of the weight matrix of a fully connected layer is batch_size*in_size*out_size. Now you need to update the weights matrix in two input dimensions batch_size, sequence_steps. The input (RNN output active value) is flattened into shape batch_size*sequence_steps*in_size. The weight matrix becomes the larger batch data. Results are flattened (unflatten).
The cost function, in which each frame of a sequence has a predicted target pair, is averaged on the corresponding dimension. Tf.reduce_mean that are normalized according to the tensor length (maximum length of the sequence) are not available. You need to manually call the Tf.reduce_sum and division operation mean by the actual sequence length normalization.
Loss function, Tf.argmax for axis 2 non-axis 1, each frame fills, calculates the mean value according to the sequence actual length. Tf.reduce_mean the mean value for all words in the batch data.
TensorFlow automatic derivative calculation, you can use sequence to classify the same optimization operation, only need to substitute new cost function. Cut all RNN gradients to prevent training divergence and avoid negative effects.
Training model, Get_sataset download handwriting image, preprocessing, lowercase single-hot coded vector. Randomly disrupts the data order, divides the training set and the test set.
There are dependencies (or mutual information) between the adjacent letters of the word, and RNN saves all input information of the same word to the implied active value. The first few letters are classified, the network does not have a large amount of input to infer additional information, bidirectional RNN (bidirectional RNN) to overcome defects.
Two RNN observe the input sequence, one reads the word from the left side in the usual order, and the other reads the word from the right side in reverse order. Each time step gets two output active values. Before being fed into the shared Softmax layer, splice. The classifier obtains complete word information from each letter. TF.MODLE.RNN.BIDIRECTIONAL_RNN has been implemented.
Implement bidirectional Rnn. Divide the forecast attribute into two functions, focusing only on less content. The _shared_softmax function, passing in the function tensor data, infers the input dimension. Reusing other schema functions, the same flattening technique shares the same softmax layer at all time steps. RNN.DYNAMIC_RNN creates two rnn.
Sequence inversion is easier than implementing a new reverse-transfer RNN operation. The Tf.reverse_sequence function reverses the sequence_lengths frame in the frame data. The diagram node has a name. The scope parameter is the Rnn_dynamic_cell variable scope name, and the default value is RNN. Two parameters different rnn, requires a different domain.
The reversal sequence is fed back to the RNN, the network output is reversed, and the forward output is aligned. Two tensor is stitched along the output dimension of the RNN neuron and returned. The bidirectional RNN model performs better.
Import Gzipimport Csvimport numpy as Npfrom helpers import downloadclass ocrdataset:url = ' Http://ai.stanford. Edu/~btaskar/ocr/letter.data.gz ' Def __init__ (self, cache_dir): Path = Download (Type-self). URL, cache_dir) lines = self._read (path) data, target = Self._parse (lines) Self.data, sel F.target = Self._pad (data, target) @staticmethoddef _read (filepath): With Gzip.open (filepath, ' RT ') as F Ile_: Reader = Csv.reader (file_, delimiter= ' \ t ') lines = list (reader) return lines @st Aticmethoddef _parse (lines): lines = sorted (lines, Key=lambda x:int (x[0])) data, target = [], [] Next_ = Nonefor line in lines:if not next_: Data.append ([]) target.append ([ ]) Else:assert Next_ = = Int (line[0]) next_ = Int (line[2]) if int (line[2]) >-1 Else None p Ixels = Np.array ([int (x) for x in line[6:134]]) pixels = Pixels.reshape ((8)) data[-1].append (pixels) target[-1].append (Line[1]) return data, Target @staticmethoddef _pad (data, target): Max_length = max (len (x) for x in Targe T) padding = Np.zeros ((8)) data = [x + ([padding] * (Max_length-len (x))) for x in data] target = [x + (['] * (Max_length-len (x)))) for x in Target]return np.array (data), Np.array (target) Import TensorFlow As Tffrom helpers import Lazy_propertyclass sequencelabellingmodel:def __init__ (self, data, target, params): s Elf.data = Data Self.target = Target Self.params = params self.prediction self . Cost Self.error self.optimize @lazy_propertydef length (self): used = Tf.sign (TF.R Educe_max (Tf.abs (self.data), reduction_indices=2)) length = Tf.reduce_sum (used, Reduction_indices=1) Length = tf.cast (length, Tf.int32) return length @lazy_propertydef prediction (self): output, _ = Tf.nn.dynamic_rnn ( Tf.nn.rnn_cell. Grucell (Self.params.rnn_hidden), Self.data, Dtype=tf.float32, Sequence_length =self.length,) # Softmax layer.max_length = Int (Self.target.get_shape () [1]) num_classes = Int (self.t Arget.get_shape () [2]) weight = tf. Variable (Tf.truncated_normal ([Self.params.rnn_hidden, num_classes], stddev=0.01)) bias = tf. Variable (Tf.constant (0.1, shape=[num_classes)) # Flatten to apply same weights to all time Steps.output = Tf.reshape (outpu T, [-1, Self.params.rnn_hidden]) prediction = Tf.nn.softmax (Tf.matmul (output, weight) + bias) Predic tion = tf.reshape (prediction, [-1, Max_length, num_classes]) return prediction @lazy_propertydef Cost (self): # Comput E cross entropy for each frame.cross_entropy = Self.target * Tf.log (self.prediction) Cross_entropy =-tf.reduce_sum (cross_entropy, reduction_indices=2) mask = tf.sign (Tf.reduce_max (tf.ab S (self.target), reduction_indices=2)) cross_entropy *= mask# Average over actual sequence lengths.cross_entropy = Tf.reduce_sum (cross_entropy, Reduction_indices=1) cross_entropy/= tf.cast (self.length, tf.float32) return TF . Reduce_mean (cross_entropy) @lazy_propertydef error (self): mistakes = tf.not_equal (tf.ar Gmax (Self.target, 2), Tf.argmax (Self.prediction, 2)) mistakes = Tf.cast (mistakes, tf.float32) mask = Tf.sign (Tf.reduce_max (Tf.abs (self.target), reduction_indices=2)) mistakes *= mask# Average over actual sequenc E Lengths.mistakes = tf.reduce_sum (mistakes, Reduction_indices=1) mistakes/= tf.cast (self.length, Tf.float32) r Eturn Tf.reduce_mean (mistakes) @lazy_propertydef optimize (self): gradient = Self.params.optimizer.comput E_gradients (self.cost) TRY:limit = self.params.gradient_clipping gradient = [(Tf.clip_by_value (g ,-limit, limit), V) if G is not None of else (None, V) for G, V in gradient]except attributeerror:print (' No gradient clipping Parameter specified. ') optimize = self.params.optimizer.apply_gradients (gradient) return optimizeimport Randomimport TensorFlow as Tfimport NumPy as Npfrom helpers import attrdictfrom ocrdataset import ocrdatasetfrom sequencelabellingmodel import Sequencelabell Ingmodelfrom batched import batched params = Attrdict (Rnn_cell=tf.nn.rnn_cell. Grucell, rnn_hidden=300, Optimizer=tf.train.rmspropoptimizer (0.002), gradient_clipping=5, BATC h_size=10, Epochs=5, epoch_size=50) def get_dataset (): DataSet = Ocrdataset ('./ocr ') # Flatten images I Nto vectors.dataset.data = Dataset.data.reshape (Dataset.data.shape[:2] + ( -1,)) # One-hot Encode targets.target = Np.zeros (Dataset.target.shape + ()) For index, Np.ndenumerate (dataset.target): if Letter:target[index][ord (letter)-Ord (' a ')] = 1data Set.target = target# Shuffle Order of Examples.order = Np.random.permutation (len (dataset.data)) Dataset.data = data Set.data[order] Dataset.target = Dataset.target[order]return dataset# Split into training and test data.dataset = g Et_dataset () split = Int (0.66 * len (dataset.data)) train_data, Test_data = Dataset.data[:split], Dataset.data[split: ] Train_target, Test_target = Dataset.target[:split], dataset.target[split:]# Compute graph._, length, image_size = tra In_data.shape num_classes = train_target.shape[2] data = Tf.placeholder (Tf.float32, [None, Length, image_size]) t Arget = Tf.placeholder (Tf.float32, [None, Length, num_classes]) model = Sequencelabellingmodel (data, target, params) Batches = batched (Train_data, Train_target, params.batch_size) Sess = tf. Session () Sess.run (Tf.initialize_all_variables ()) for index, batchIn Enumerate (batches): Batch_data = batch[0] Batch_target = batch[1] Epoch = batch[2]if Epoch >= p Arams.epochs:breakfeed = {data:batch_data, target:batch_target} error, _ = Sess.run ([Model.error, Model.optimize] , feed) print (' {}: {: 3.6f}% '. Format (index + 1, * error)) Test_feed = {data:test_data, target:test_target} test_ Error, _ = Sess.run ([Model.error, Model.optimize], test_feed) print (' Test error: {: 3.6f}% '. Format (error)) Import TensorFlow as Tffrom helpers import Lazy_propertyclass bidirectionalsequencelabellingmodel:def __init__ (self, data, Target, params): Self.data = Data Self.target = target Self.params = params se Lf.prediction self.cost self.error self.optimize @lazy_propertydef Length (self): used = Tf.sign (Tf.reduce_max (Tf.abs (self.data), reduction_indices=2)) length = Tf.reduce_sum (used, re Duction_indices=1) LengtH = tf.cast (length, tf.int32) return length @lazy_propertydef prediction (self): output = Self._bidirectio Nal_rnn (Self.data, self.length) num_classes = Int (Self.target.get_shape () [2]) prediction = Self._sha Red_softmax (output, num_classes) return predictiondef _bidirectional_rnn (self, data, length): length_64 = Tf.cas T (length, Tf.int64) forward, _ = Tf.nn.dynamic_rnn (Cell=self.params.rnn_cell (Self.params.rnn_hi Dden), Inputs=data, Dtype=tf.float32, Sequence_length=length, Scope= ' Rnn-forward ') backward, _ = Tf.nn.dynamic_rnn (Cell=self.params.rnn_cell (Self.params.rnn_hidd EN), inputs=tf.reverse_sequence (data, length_64, seq_dim=1), Dtype=tf.float32, sequence_l Ength=self.length, scope= ' rnn-backward ') backward = tf.reverse_sequence (backward, length_64, Seq_dim =1) output =Tf.concat (2, [Forward, backward]) return outputdef _shared_softmax (self, data, out_size): max_length = Int (data. Get_shape () [1]) in_size = Int (Data.get_shape () [2]) weight = tf. Variable (Tf.truncated_normal ([In_size, Out_size], stddev=0.01)) bias = tf. Variable (Tf.constant (0.1, shape=[out_size])) # Flatten to apply same weights to all time Steps.flat = Tf.reshape (data, [-1, In_size]) output = Tf.nn.softmax (Tf.matmul (flat, weight) + bias) output = tf.reshape (output, [-1, M Ax_length, out_size]) return output @lazy_propertydef cost (self): # Compute Cross entropy for each frame.cross_entrop y = self.target * Tf.log (self.prediction) cross_entropy =-tf.reduce_sum (cross_entropy, reduction_indices=2) Mask = tf.sign (Tf.reduce_max (Tf.abs (self.target), reduction_indices=2)) cross_entropy *= mask# Average Over actual sequence lengths.cross_entropy = Tf.reduce_sum (cross_entropy, ReductiOn_indices=1) cross_entropy/= tf.cast (self.length, Tf.float32) return Tf.reduce_mean (cross_entropy) @laz Y_propertydef error (self): mistakes = tf.not_equal (Tf.argmax (Self.target, 2), Tf.argmax (SELF.PR Ediction, 2)) mistakes = Tf.cast (mistakes, tf.float32) mask = tf.sign (Tf.reduce_max (Tf.abs (self.targ ET), reduction_indices=2)) mistakes *= mask# Average over actual sequence lengths.mistakes = Tf.reduce_sum (Mist Akes, Reduction_indices=1) mistakes/= tf.cast (self.length, Tf.float32) return Tf.reduce_mean (mistakes) @ Lazy_propertydef optimize (self): gradient = self.params.optimizer.compute_gradients (self.cost) Try: Limit = self.params.gradient_clipping gradient = [(Tf.clip_by_value (G,-limit, limit ), V) if G is not none of else (none, V) for G, V in gradient]except attributeerror:print (' No gradient clipping parameter speci Fied. ') Optimize = self.params.optimizer.apply_gradients (gradient) return optimize
Resources:
"TensorFlow Practice for Machine Intelligence"
Welcome to add me to Exchange: Qingxingfengzi
My public number: Qingxingfengzigz
My wife Zhang Yuqing's public number: Qingqingfeifangz