The reason for writing this article is that people are asked to call outputs, Last_state = TF.NN.STATIC_RNN (cell,inputs) after Last_state and outputs[-1] are equal. If you do not want to wait, why not wait for it.
In fact, this is also a major difficulty in learning RNN. I personally think that the hardest part of learning RNN is figuring out what its inputs and outputs are. A simple answer is that for a RNN unit, it takes a current input x_t and the previous step of the hidden layer State s_{t-1}, and then produces a new hidden layer state s_t, namely: s_t = f (x_t,s_{t-1}), where F represents a function that corresponds to the operation of the RNN internal Ride.
This is not a mistake, but it is not good enough, because in many cases, the output and state of the RNN are separated and the understanding will be smoother. In other words, we think that RNN is such a unit: y_t, s_t = f (x_t, s_{t-1}), draw a picture, that is:
This is where this article emphasizes: be sure to differentiate between the output and the state of the RNN . What's the use of doing this? Look first.
First look at one of the most basic examples, consider the Vanilla Rnn/gru Cell (Vanillarnn is the most common RNN, corresponding to the TensorFlow), the working process is as follows:
At this point, s_t = y_t = h_t, the distinction between these two really useless.
But. What if it's LSTM. For LSTM, its loop part actually has two parts, one is the value of the internal cell, and the other is the hiddenstate computed from the cell and output gate, which uses only the information of hidden state instead of the cell directly. As a result, LSTM's work process is:
Which is really used in the loop state s_t is actually (c_t, h_t) composed of a tuple (TensorFlow lstmstatetuple, of course, if you choose not to use a tuple to represent the internal state of LSTM, you can also put c_t and h_t together , together to spell a Tensor, so that its state is a Tensor, then do other calculations may be convenient, this is actually tensorflow state_is_tuple this switch. ), and the output y_t is only h_t (for example, after the network is connected to a full-attached layer and then the classification with Softmax, the input of the all-connected layer is only h_t, and no c_t), then you can see the difference between the output of RNN and the significance of the state.
Of course, this abstraction means more than that. If it is a multi-layered Vanilla rnn/gru cell, then a simple abstraction is that the multi-layer cell as a whole, as a large cell, and then the relationship between the original layers as this large cell internal computing process/data flow process, so externally, Multilayer RNN and single-layer RNN interfaces are exactly the same: externally, multilayer RNN is just a single layer of RNN that computes more complex inside. This is illustrated below:
The generous box means to treat the multilayer RNN as a large Cell, and the small box inside corresponds to the original RNN of each layer. At this point, if the generous box as a whole, then the whole cycle of the state required is the state of the various layers of the set, or the state of the layers together to form a tuple: (Why do we use tuple here?). Just put them in a Tensor. No, the tuple has to go one by one, this much trouble. The answer is, no. Because multilayer rnn do not need to be as large as each layer, for example, it is possible that the lowest-level dimension is higher, the number of hidden-layer elements is larger, and the higher the hidden-layer dimension is. In this way, each layer of the state dimension is different, can not concat into a Tensor ah. ), and the output of this large RNN unit is only the output of the original top RNN, which is the whole.
In this example, the output and state of the large RNN unit are obviously different. In this perspective, multilayer RNN and single-layer RNN can be viewed with a unified perspective, and the world is refreshing a lot. In fact, in TensorFlow, Multirnncell is a subclass of Rnncell, which treats multilayer RNN as a whole, as a single-layer RNN with more complex internal structures. The function f that I mentioned at the beginning of the article, which represents RNN internal computation, is actually the __call__ method of Rnncell.
Take another look at the last example, multilayer LSTM:
Similar to the previous example, the multi-layer LSTM as a whole, the output of the whole is the output of the topmost LSTM:; and this whole cycle depends on the state of each layer is combined into a tuple, and each layer state itself is a (c, h) tuple, So the final result is a tuple of tuple, as shown in the figure.
In this way, you can answer two questions:
One is, outputs, last_state = Tf.nn.static_rnn (cell,inputs), Last_state and outputs[-1] are equal.
Outputs is a list of output components of the RNN Cell, assuming a total of T-time steps, then outputs = [Y_1, Y_2, ..., y_t], so outputs[-1] = y_t, and last_state is the last step of the hidden Layer state, i.e. s_t.
So, in the end outputs[-1] and so not equal to last_state it. Or that y_t is not equal to s_t. Take a look at the above four graphs to know that they are equal only when using single-layer Vanilla Rnn/gru.
The second is that LSTM as a state and output is not the same wonderful, cause we often need to manipulate the (c, h) tuple, or between the tuple and Tensor conversion. One of the most common conversion codes is this:
Lstm_state_as_tensor_shape = [num_layers,2, Batch_size, Hidden_size]
Initial_state = tf. Zeros (Lstm_state_as_tensor_shape)
Unstack_state = tf. Unstack (Initial_state,axis=0)
Tuple_state = tuple ([TF]. Contrib. RNN. Lstmstatetuple (Unstack_state[idx][0], unstack_state[idx][1]) for IDX in range (num_layers)])
Inputs = tf. Unstack (inputs, num=Num_steps,axis=1)
Outputs, state_out = tf. Contrib. RNN. Static_rnn (cell,inputs, initial_state=tuple_state)
Pondering over the last multi-layered LSTM, we know how these lines of code convert tensor into multi-layered LSTM states.
from:https://zhuanlan.zhihu.com/p/28919765