TensorFlow New SEQ2SEQ interface uses

Source: Internet
Author: User
Tags documentation
Introduction

TensorFlow 1.0.0, a new SEQ2SEQ interface was developed and the original interface was deprecated.

The old Seq2seq interface is the part under Tf.contrib.legacy_seq2seq, and the new interface is under TF.CONTRIB.SEQ2SEQ.

The main difference between the new SEQ2SEQ interface and the old one is that it is dynamically expanded , while the old one is statically expanded .

static unrolling : Refers to the definition of the model when the graph is created, the length of the sequence is fixed, then all incoming sequences are defined at the specified length. So all the sentences are padding to a specified length, a waste of storage space, and the computational efficiency is not high. But to deal with variable-length sequences, there are ways to specify a series of buckets, such as

[(5,10), (10, 15), (15, 20)]

The sequence is then assigned to one of the buckets according to the length, and then padding to the length specified by the bucket, and when the graph is created, multiple sub-graph are created based on buckets.

Dynamic unrolling: Using the control Flow OPS processing sequence, you do not need to specify a good sequence length beforehand.

however, regardless of static or dynamic, the sequence length within each batch of the input is the same.

The categories and methods in the new interface are as follows

_allowed_symbols = [
    "Sequence_loss",
    "Decoder",
    "Dynamic_decode",
    "Basicdecoder",
    " Basicdecoderoutput ",
    " Beamsearchdecoder ",
    " Beamsearchdecoderoutput ",
    " Beamsearchdecoderstate ",
    "Helper",
    "Customhelper",
    "Finalbeamsearchdecoderoutput",
    "Gather_tree",
    " Greedyembeddinghelper ",
    " Sampleembeddinghelper ",
    " Scheduledembeddingtraininghelper ",
    " Scheduledoutputtraininghelper ",
    " Traininghelper ",
    " Bahdanauattention ",
    " Luongattention ",
    " Hardmax ",
    " Attentionwrapperstate ",
    " Attentionwrapper ",
    " Attentionmechanism ",
    " Tile_batch "]

The best way to familiarize yourself with these interfaces is to read the API documentation and then use them.

This article summarizes the use of several of these, to achieve a basic encoder-decoder SEQ2SEQ model. Basic Encoder-decoder Model

Sequence to Sequence Learning with neural Networks [1] This article presents a most basic Encoder-decoder model, with no attention mechanism. The framework of the model is shown in the following figure:

The input sequence is [' A ', ' B ', ' C ', ' <EOS> '], the output sequence is [' W ', ' X ', ' Y ', ' Z ', ' <EOS> ']

Here encoder encodes the input sequence, and the last moment outputs the hidden state (the final state below) as the encoding vector for the input sequence.

Decoder The Terminator <EOS> as the initial input (you can also use other symbols, such as <SOS>, etc.), the final state of encoder as the initial status, and then generates the sequence until the Terminator <EOS> is encountered.

Structure is very simple, as long as the implementation of encoder and decoder and then string them up can be. Encoder Implementation

[1] In the encoder using a 4-layer one-way lstm, this part of the RNN interface can be used, do not need to use the Seq2seq interface. The model framework in the first diagram illustrates the architecture of Encoder-decoder, but the implementation does not directly input the sequence [' A ', ' B ', ' C ', ' <EOS> '] into the encoder, and the complete architecture of encoder is shown in the following figure:

Frame Description:

Input: Instead of the original sequence, each element in the sequence is converted to the corresponding ID in the dictionary. Whether it is train or inference phase, in order to efficiency is to enter a mini-batch at a time, so you need to define an int type rank=2 placeholder for input.

Embedding: A variable defined as trainable=true so that even the word vectors using pre-trained can be tuned during the training of the model.

Multilayer_lstm: The input received is the word vector for each element in the sequence.

Where the Tf.nn.dynamic_rnn method receives the encoder instance as well as the embbeded vector, it outputs the outputs and final state that contain hidden states for each moment, if the initial status is 0. You do not need to explicitly declare zero_state and then pass it as a parameter, just specify the state's Dtype, which will automatically initialize the initial status to 0 vectors, the source code that is intercepted from the TensorFlow is as follows:

If Initial_state is isn't None: state
    = initial_state
Else:
    If the Dtype:
        raise ValueError ("If there is no I Nitial_state, you must give a dtype. ")
    State = Cell.zero_state (Batch_size, Dtype)
Decoder Implementation

The implementation of the decoder section requires the use of the Seq2seq module. Similarly, the architecture diagram that expands the decoder portion of the first overall frame chart is as follows

Frame Description:

Input: Same as encoder, which is the ID corresponding to the sequence element.

Embedding: Depending on the situation and need not be different from the encoder embedding, for example, in translation, the source language and the target language of the word vector space is not the same, but like the text digest is based on a language, encoder and decoder embedding The matrix is available for common use.

Dense_layer: Unlike encoder output hidden state only, decoder needs to output the probability of each token in the dictionary at each moment, so a dense Layer is required to hidden The state vector is converted to a vector with a dimension equal to Vocabulary_size, and then the logits of the dense layer output is passed through the Softmax layer to obtain the final token probability.

The definition of decoder needs to distinguish between the inference phase and the train phase.

Inference stage, the output of decoder is unknown, for the generation of [' W ', ' X ', ' Y ', ' Z ', ' <EOS> '] sequence, is after the decoder output token ' W ', then ' w ' as input, combined with the hidden at this time State, infer the next token ' X ', and so on until the output is <EOS> or the longest sequence length is reached.

In the train phase, the sequence that the decoder should output is known, regardless of the result of the final output, and the tokens in the known sequence are entered sequentially. Train phase If the output is also used as input, once the previous step is wrong, it will enlarge the error, resulting in more unstable training process. Interface Description

Decoder will use the Traininghelper in Seq2seq, Greedyembeddinghelper, Basicdecoder three classes, and the Dynamic_decode method, The dense class under Tensorflow.python.layers.core will also be used. Basicdecoder

The first concern of implementation decoder is Basicdecoder, whose constructors and parameters are defined as follows:

__init__ (cell, helper, initial_state, Output_layer=none)
- Cell: an Rnncell instance.
- helper: A Helper instance.
- initial_state: A (possibly nested tuple of ...) tensors and tensorarrays. The initial state of the Rnncell.
- Output_layer: (Optional) An instance of Tf.layers.Layer, i.e., tf.layers.Dense. Optional layer to apply to the RNN output prior to storing the result or sampling.

Cell: Here is an example of a multi-layered lstm, as in defining encoder
Helper: Here is just a simple explanation is a helper instance, the first time you look at the document must not know what this helper is, do not worry, see the specific helper instance to understand
Initial_state:encoder final state, type consistent, that is, if the final state of encoder is a tuple type (for example, Lstm contains cell state and hidden state), Then the input here must also be a tuple. Simply input Encoder's final_state as this parameter
Output_layer: The corresponding is the frame diagram of the Dense_layer, but the document is written tf.layers.Dense, but tf.layers only dense method, dense instances also need from Tensorflow.python.layers.core import dense.

The role of Basicdecoder is to define an instance that encapsulates the functionality that decoder should have, and depending on the helper instance, this decoder can implement different functions, such as in the train phase, without re-entering the output, and in the inference phase , the output is entered. Traininghelper

The constructors and parameters are as follows:

__INIT__ (inputs, sequence_length, Time_major=false, Name=none)
- Inputs: A (structure of) input tensors.
- Sequence_length: an int32 vector tensor.
- time_major: Python bool. Whether the tensors in inputs is time major. If False (default), they is assumed to be batch major.
- name: Name scope for any created operations.

Inputs: When corresponding to the embedded_input,time_major=false in the decoder frame diagram, the shape of inputs is [Batch_size, Sequence_length, Embedding_ Size], time_major=true, inputs's shape is [Sequence_length, Batch_size, Embedding_size]
Sequence_length: This document is written too briefly, but in the source code you can see the length of each sequence in the current batch (Self._batch_size = Array_ops.size (sequence_length)).
Time_major: Determine the meaning of the first two dim of inputs tensor
Name: As described in the documentation

Traininghelper is used for the train phase, and the Next_inputs method also receives outputs and sample_ids, but only returns the input from the next moment when the inputs is initialized. Greedyembeddinghelper

__init__ (embedding, Start_tokens, End_token)
- embedding: A callable that takes a vector tensor of IDs (Argmax IDs), or the params argument for Embedding_look Up. The returned tensor is passed to the decoder input.
- start_tokens: Int32 vector shaped [batch_size], the start tokens.
- End_token: Int32 scalar, the token that marks end of decoding.

A helper for use during inference.
Uses the Argmax of the output (treated as logits) and passes the result through a embedding layer to get the next input.

The official documentation has indicated that this is a helper for the inference phase, which uses the Argmax to obtain the ID of the output logits and then passes through the embedding layer to get the next moment's input.

Embedding:params argument for Embedding_lookup, which is the defined embedding variable, is passed in.
TOKEN_ID of the start input for each sequence in the Start_tokens:batch
End_token: token_id dynamic_decode of sequence termination

Dynamic_decode (decoder, Output_time_major=false, Impute_finished=false, Maximum_iterations=none, Parallel_ Iterations=32, Swap_memory=false, Scope=none)

This method is very intuitive, the definition of the decoder instance passed in, the other several parameter documents are very clear. It is worth learning how to use control flow OPS to implement the dynamic process. Code

The code for implementing the basic Encoder-decoder model using the above interface is as follows

#-*-coding:utf-8-*-import tensorflow as TF from tensorflow.contrib.seq2seq Import * from Tensorflow.python.layers.core Import dense class Seq2seqmodel (object): Def __init__ (self, rnn_size, layer_size, Encoder_vocab_size, Dec Oder_vocab_size, Embedding_dim, Grad_clip, Is_inference=false): # define Inputs self.input_x = Tf.placehol Der (Tf.int32, Shape=[none, None], name= ' Input_ids ') # define embedding layer with Tf.variable_scope (' Embe Dding '): encoder_embedding = tf. Variable (Tf.truncated_normal (Shape=[encoder_vocab_size, Embedding_dim], stddev=0.1), Name= ' Encoder_embedd ing ') decoder_embedding = tf. Variable (Tf.truncated_normal (Shape=[decoder_vocab_size, Embedding_dim], stddev=0.1), Name= ' Decoder_embeddi Ng ') # define encoder with Tf.variable_scope (' encoder '): Encoder = self._get_simple_lstm (rnn_
        Size, layer_size) with Tf.device ('/cpu:0 '):    input_x_embedded = Tf.nn.embedding_lookup (encoder_embedding, self.input_x) encoder_outputs, encoder_state = t F.NN.DYNAMIC_RNN (Encoder, input_x_embedded, Dtype=tf.float32) # define helper for decoder if is_inference  : Self.start_tokens = Tf.placeholder (Tf.int32, Shape=[none], name= ' start_tokens ') Self.end_token = Tf.placeholder (Tf.int32, name= ' end_token ') helper = Greedyembeddinghelper (decoder_embedding, Self.start_token S, self.end_token) Else:self.target_ids = Tf.placeholder (Tf.int32, Shape=[none, None], name= ' target_i DS ') Self.decoder_seq_length = Tf.placeholder (Tf.int32, Shape=[none], name= ' batch_seq_length ') wit
            H Tf.device ('/cpu:0 '): Target_embeddeds = Tf.nn.embedding_lookup (decoder_embedding, Self.target_ids)
            Helper = Traininghelper (Target_embeddeds, Self.decoder_seq_length) with Tf.variable_scope (' decoder '): Fc_layer = Dense(decoder_vocab_size) Decoder_cell = self._get_simple_lstm (rnn_size, layer_size) decoder = Basicdec oder (Decoder_cell, Helper, Encoder_state, Fc_layer) logits, final_state, final_sequence_lengths = Dynamic_decode ( Decoder) if not is_inference:targets = Tf.reshape (Self.target_ids, [-1]) Logits_flat = t
            F.reshape (Logits.rnn_output, [-1, decoder_vocab_size]) print ' Shape logits_flat:{} '. Format (Logits_flat.shape) print ' Shape logits:{} '. Format (logits.rnn_output.shape) Self.cost = Tf.losses.sparse_softmax_cro Ss_entropy (Targets, Logits_flat) # define train op tvars = Tf.trainable_variables () g Rads, _ = Tf.clip_by_global_norm (Tf.gradients (Self.cost, Tvars), grad_clip) optimizer = Tf.train.AdamOptimize R (1e-3) Self.train_op = optimizer.apply_gradients (Zip (grads, tvars)) Else:self.prob = tf.

    Nn.softmax (Logits)def _get_simple_lstm (self, Rnn_size, layer_size): Lstm_layers = [Tf.contrib.rnn.LSTMCell (rnn_size) for _ in Xrange
        (Layer_size)] Return Tf.contrib.rnn.MultiRNNCell (lstm_layers)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.