TensorFlow implements RNN Recurrent Neural Network, tensorflowrnn

Source: Internet
Author: User

TensorFlow implements RNN Recurrent Neural Network, tensorflowrnn

RNN (recurrent neural Network) recurrent neural Network

It is mainly used for natural language processing (NLP)

RNN is mainly usedProcess and predict sequence data

RNN is widely used in speech recognition, language model, and machine translation.

The source of RNN isDepicts the current output of a sequence and the previous information that affects the output of subsequent nodes.

RNN isContains cyclic networks, allowing information persistence.

RNN MEETINGRemember the previous information and use the previous information to influence the output of the subsequent nodes..

Nodes in the hidden layer of RNN are connected. The input of the hidden layer includes not only the output of the input layer, but also the output of the hidden layer at the previous time.

RNN willInput at each time point combines the current model status to provide an output.

RNN is regardedResults of infinite replication of the same neural network structureAt present, RNN cannot achieve a true infinite loop, which is generally expanded in a circular body.

RNNFigure:


RNNThe best problem is time series-related.

RNN canThe data at different time points in the sequence is input into the input layer of the recurrent neural network in sequence, and the output can be the prediction of the next time point in the sequence, or the processing result of the current time point information.

One of the key points of RNN is that they can be usedConnect previous information to the current task

Expanded RNN:


Parameters in the cyclic body network are shared at different times..

RNNThe status is represented by a vector., ThisThe dimension of the vector is also called the size of the RNN hidden layer..

If the vector is h, the input is x, and the activation function is tanh


Calculation process of Forward Propagation:


TheoreticallyRNN supports sequences of any lengthHowever, if the sequence is too long, the gradient will disappear during optimization. Generally, the maximum length is set and the gradient will be truncated if the sequence is too long.

 Code implementation:

Import numpy as np # defines the RNN parameters. X = [0.0] state = [0.0,] w_cell_state = np. asarray ([[0.1, 0.2], [0.3, 0.4]) w_cell_input = np. asarray ([0.5, 0.6]) B _cell = np. asarray ([0.1,-0.1]) w_output = np. asarray ([[1.0], [2.0]) B _output = 0.1 # executes the Forward propagation process. For I in range (len (X): before_activation = np. dot (state, w_cell_state) + X [I] * w_cell_input + B _cell state = np. tanh (before_activation) final_output = np. dot (state, w_output) + B _output print ("before activation:", before_activation) print ("state:", state) print ("output:", final_output)

LSTM (long short-term memory) long memory Network:

LSTMSolved the problem that RNN does not support long-term dependency.To make itGreatly improves memory duration.

The key to successful RNN application is LSTM.

LSTM is a typeThree "Door" Structures.


Pink circlesPointwiseOperations, such as the sum of vectors, and the yellow matrix isLearned Neural Network Layer. Line representationVector join, Separated line representationContent copiedAnd distribute them to different locations.

LSTM core ideas:

The key to LSTM isCell statusThe horizontal line runs through the graph.

The cell state is similar to a conveyor belt. It runs directly on the entire chain. There is only a small amount of linear interaction, and it is easy to keep the information unchanged.


LSTM has a well-designed name"Gate" structure to remove or increase the ability of information to the cell state.

"Door" is a method that allows information to be selected, ContainsSigmoidNeural network layer andPointwise(Perform multiplication by bit.

This is called a "Door" becauseWhen Sigmoid is used as the layer for activating the function, a value between 0 and 1 is output.To describe how much information each part can use this structure.

0 indicates that "No amount is allowed to pass". 1 indicates that "any amount is allowed to pass "!

LSTM formula:


Code implementation:

Import tensorflow as tf # define an LSTM structure lstm = rnn_cell.BasicLSTMCell (l1__hidden_size) # initialize the status in LSTM to a full 0 array, and use a batch training sample state = lstm each time. zero_state (batch_size, tf. float32) # define the loss function loss = 0.0 # specify the maximum sequence length for I in range (num_steps): # reuse the previously defined variable if I> 0: tf. get_variable_scope (). reuse_variables () # pass the status of the current input and the previous time to the defined LSTM structure to obtain the output and updated status l1__output, state = lstm (current_input, state) # input the output of the LSTM structure at the current time point to a full The connection layer gets the final output. Final_output = fully_connectd (l1__output) # Calculate the loss of the current output loss + = calc_loss (final_output, expected_output)

Bidirectional Recurrent Neural Network

The state transmission in the classic cyclic neural network is one-way from the past to the next. However, the output at the current time is not only related to the previous state, but also to the subsequent State.

Bidirectional cyclic Neural NetworkSolve the problem of one-way transmission of status.

The bidirectional recurrent neural network isComposed of two recurrent neural networks that are uplinked up and down.,The States of the two recurrent neural networks jointly determine the output..

That isThe output at t time is not only dependent on the memory of the past, but also on what will happen later.


Deep (bidirectional) Recurrent Neural Network

Deep recurrent neural networks are similar to bidirectional recurrent neural networks,There are multiple layers in each duration.

Deep cyclic neural networks haveStronger Learning Ability.

Deep Recurrent Neural NetworksThe cyclic body structure is replicated multiple times at each time., Similar to convolutional neural networks,Parameters in the loop body of each layer are consistent.,Parameters of different layers can be different..


Used in TensorFlowMultiRNNCellImplements Forward propagation at every moment in a deep cyclic neural network. The remaining steps are the same as the RNN build steps.

Dropout in RNN

PassThe dropout method can be more robust on the convolution neural network., Similar to RNN can achieve the same effect.

Like a convolutional neural network, RNN only uses dropout in the final full connection layer.

RNN averageUse dropout only in the loop body structure of different layers, Not in the same layer.

In the same time t,Dropout is used between different loop bodies..

In TensorFlow, useDropoutWrapperClass to implement the dropout function.

Use the input_keep_prob parameter to control the input dropout probability.

Use the output_keep_prob parameter to control the dropout probability of the output.

TensorFlow sample to implement the RNN Language Model

Code:

Import numpy as np import tensorflow as tf import reader DATA_PATH = ".. /datasets/PTB/data "HIDDEN_SIZE = 200 # hidden layer scale NUM_LAYERS = 2 # Layer of LSTM structure in deep RNN VOCAB_SIZE = 10000 # Number of word identifiers LEARNING_RATE = 1.0 # learning rate TRAIN_BATCH_SIZE = 20 # training data size TRAIN_NUM_STEP = 35 # training data truncation length # EVAL_BATCH_SIZE = 1 # test data size EVAL_NUM_STEP = 1 # Test Data truncation length NUM_EPOCH = 2 # Use Training KEEP_PROB = 0.5 # nodes are not dropout MAX_GRA D_NORM = 5 # control gradient expansion parameter # define a class to describe the model structure. Class PTBModel (object): def _ init _ (self, is_training, batch_size, num_steps): self. batch_size = batch_size self. num_steps = num_steps # defines the input layer. Self. input_data = tf. placeholder (tf. int32, [batch_size, num_steps]) self.tar gets = tf. placeholder (tf. int32, [batch_size, num_steps]) # defines the use of the LSTM structure and dropout during training. L1__cell = tf. contrib. rnn. basicLSTMCell (HIDDEN_SIZE) if is_training: l1__cell = tf. contrib. rnn. dropoutWrapper (l1__cell, output_keep_prob = KEEP_PROB) cell = tf. contrib. rnn. multiRNNCell ([l1__cell] * NUM_LAYERS) # initialize the initial state. Self. initial_state = cell. zero_state (batch_size, tf. float32) embedding = tf. get_variable ("embedding", [VOCAB_SIZE, HIDDEN_SIZE]) # convert the original word ID into a word vector. Inputs = tf. nn. embedding_lookup (embedding, self. input_data) if is_training: inputs = tf. nn. dropout (inputs, KEEP_PROB) # defines the output list. Outputs = [] state = self. initial_state with tf. variable_scope ("RNN"): for time_step in range (num_steps): if time_step> 0: tf. get_variable_scope (). reuse_variables () cell_output, state = cell (inputs [:, time_step,:], state) outputs. append (cell_output) output = tf. reshape (tf. concat (outputs, 1), [-1, HIDDEN_SIZE]) weight = tf. get_variable ("weight", [HIDDEN_SIZE, VOCAB_SIZE]) bias = tf. get_v Ariable ("bias", [VOCAB_SIZE]) logits = tf. matmul (output, weight) + bias # defines the cross entropy loss function and average loss. Loss = tf. contrib. legacy_seq2seq.sequence_loss_by_example ([logits], [tf. reshape (self.tar gets, [-1])], [tf. ones ([batch_size * num_steps], dtype = tf. float32)]) self. cost = tf. performance_sum (loss)/batch_size self. final_state = state # Only reverse propagation operations are defined during model training. If not is_training: return trainable_variables = tf. trainable_variables () # control the gradient size and define the optimization methods and training steps. Grads, _ = tf. clip_by_global_norm (tf. gradients (self. cost, trainable_variables), MAX_GRAD_NORM) optimizer = tf. train. gradientDescentOptimizer (LEARNING_RATE) self. train_op = optimizer. apply_gradients (zip (grads, trainable_variables) # Run train_op on data using the given model and return the perplexity value def run_epoch (session, model, data, train_op, output_log, epoch_size): total_costs = 0.0 iters = 0 state = ses Sion. run (model. initial_state) # Train an epoch. For step in range (epoch_size): x, y = session. run (data) cost, state, _ = session. run ([model. cost, model. final_state, train_op], {model. input_data: x, model.tar gets: y, model. initial_state: state}) total_costs + = cost iters + = model. num_steps if output_log and step % 100 = 0: print ("After % d steps, perplexity is %. 3f "% (step, np. exp (total_costs/iters) return np. exp (total_costs/iters) # define the main letter Run def main (): train_data, valid_data, test_data, _ = reader. ptb_raw_data (DATA_PATH) # calculate the number of times a epoch needs to be trained. train_data_len = len (train_data) train = train // train = (Train-1) // TRAIN_NUM_STEP valid_data_len = len) valid_batch_len = valid_data_len // EVAL_BATCH_SIZE valid_epoch_size = (valid_batch_len-1) // EVAL_NUM_STEP test_d Ata_len = len (test_data) test_batch_len = test_data_len // EVAL_BATCH_SIZE test_epoch_size = (test_batch_len-1) // EVAL_NUM_STEP initializer = tf. random_uniform_initializer (-0.05, 0.05) with tf. variable_scope ("language_model", reuse = None, initializer = initializer): train_model = PTBModel (True, TRAIN_BATCH_SIZE, TRAIN_NUM_STEP) with tf. variable_scope ("language_model", reuse = True, initializer = initi Alizer): eval_model = PTBModel (False, EVAL_BATCH_SIZE, EVAL_NUM_STEP) # Train the model. With tf. session () as session: tf. global_variables_initializer (). run () train_queue = reader. ptb_producer (train_data, train_model.batch_size, train_model.num_steps) eval_queue = reader. ptb_producer (valid_data, eval_model.batch_size, eval_model.num_steps) test_queue = reader. ptb_producer (test_data, eval_model.batch_size, eval_model.num_steps) coord = tf. train. coordinator () threads = tf. train. start_queue_runners (sess = session, coord = coord) for I in range (NUM_EPOCH): print ("In iteration: % d" % (I + 1) run_epoch (session, train_model, train_queue, train_model.train_op, True, train_epoch_size) valid_perplexity = run_epoch (session, eval_model, eval_queue, tf. no_op (), False, valid_epoch_size) print ("Epoch: % d Validation Perplexity: %. 3f "% (I + 1, valid_perplexity) test_perplexity = run_epoch (session, eval_model, test_queue, tf. no_op (), False, test_epoch_size) print ("Test Perplexity: %. 3f "% test_perplexity) coord. request_stop () coord. join (threads) if _ name _ = "_ main _": main ()

The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.