talking about the structure of RNN
According to the output and the input sequence different quantity rnn can have many different structures, the different structure naturally has the different reference situation. The following figure, one to one structure, simply gives an input an output that does not embody the characteristics of the sequence, such as the image classification scene. One to many structure that gives an input a series of outputs that can be used to produce scenes for picture descriptions. Many to one structure, which gives an output to a series of inputs, which can be used for text affective analysis, to classify the text input of some columns to see whether it is negative or positive. Many to many structure, which gives a series of output to some column inputs, which can be used for translation or chat dialogs, and for converting text into additional column text. Sync many to many structure, it is a classic RNN structure, the previous input state will be brought to the next state, and each input will correspond to an output, we are most familiar with the character prediction, but also can be used for video classification, video frame tag.
Seq2seq
In the two models of many to many, the above figure can see that the fourth and fifth are different, the classical RNN structure of the input and output sequence must be equal length, its application scenario is relatively limited. And the fourth kind of it can be the input and output sequence is not long, this model is the SEQ2SEQ model, namely Sequence to Sequence. It implements the conversion from one sequence to another, such as Google used the SEQ2SEQ model to add the attention model to achieve the translation function, similar to the Chat robot dialog model can be implemented. The classical RNN model fixed the input sequence and the output sequence size, while the SEQ2SEQ model broke the limit.
In fact, for SEQ2SEQ's decoder, its processing of RNN output in the training phase and the predictive phase may not be the same, for example, in the training phase of the RNN output can not be processed, directly with the target sequence as the next time input, as shown above. The prediction phase takes the output of RNN as the input at the next moment, because there is no target sequence available as input, as shown in figure two above. Encoder-decoder Structure
Seq2seq belong to a encoder-decoder structure, here look at the common encoder-decoder structure, the basic idea is to use two rnn, one RNN as encoder, and the other RNN as decoder. Encoder is responsible for compressing the input sequence into a vector of the specified length, which can be seen as the semantics of the sequence, which is called encoding, and the simplest way to obtain a semantic vector is to directly use the implicit state of the last input as the semantic vector c. You can also make a transform to the last implied state to get the semantic vector, and you can transform all the implied states of the input sequence into a semantic variable.
and decoder is responsible for generating the specified sequence according to the semantic vector, this process is also called decoding, the simplest way is to encoder the semantic variables obtained in the initial state into the decoder rnn, and get the output sequence. You can see the output at the last moment as the input for the current moment, and the semantic vector c is only involved in the initial state, and the following operations are independent of the semantic vector c.
Another kind of decoder processing is that semantic vector c participates in the operation of the sequence all the time, as shown in the following figure, the output of the last moment is still the input of the current moment, but the semantic vector c is involved in all time operations.
The Encoder-decoder model has no requirement for the length of input and output sequences, and the application scenario is more extensive. How to train
A simple model of the Encoder-decoder model was introduced, but the following diagram of a slightly more complex model illustrates the train of thought, with different encoder-decoder model structures, but the core ideas of training are very similar.
We know that RNN can learn the probability distribution and then make predictions, for example, we input t time data to predict the t+1 moment of data, the most classic is the character prediction examples, can be in the previous "circular neural network" and "TensorFlow Build circular neural network" to learn more detailed instructions. In order to get the probability distribution, the probability of each classification can be obtained by using the Softmax activation function in the RNN output layer.
For RNN, for a sequence, for time t, its output probability is P (xt|x1,..., xt−1), then the Softmax layer of each neuron is calculated as follows:
P (xt,j|x1,..., xt−1) =exp (WJ