Note: This article is mainly in http://colah.github.io/posts/2015-08-Understanding-LSTMs/this article based on the understanding written, may also be called the understanding of Understanding LSTM Network. Thanks to the author for his selfless sharing and the popular and accurate explanation.
I. RNN
When it comes to lstm, it is inevitable to mention the simplest and most primitive rnn first. In this part, my goal is simply to understand the word "loop" in the "Recurrent neural network" and not to throw out any formulas, by the way, by mentioning the input data format that once puzzled my keras.
We can often see that lstm is suitable for sequential sequences and variable-length sequences, especially for natural language processing. So what gives it the ability to handle variable-length sequences? In fact, as long as careful study, I believe that everyone can have an intuitive answer.
From the left side of the picture, RNN has two inputs, one is the input XT for the current T moment, and the other is an input that looks like "itself".
This is not very clear, look at the picture to the right: Actually, the right is a left-hand image on the time series of the expansion, the last moment output is the input of this moment. It is important to note that, in fact, all neurons on the right are the same neuron, the left, which share the same weights, but accept different inputs at each moment, and then output to the next moment as input. This is the information stored in the past.
Understanding the meaning of "loops" is the purpose of this chapter, and the formulas and details are described in detail in lstm.
Keras Chinese Document: http://keras-cn.readthedocs.io/en/latest/layers/recurrent_layer/(Chinese document really good, in addition to the content of translation, but also add additional content, For example tensor, the concept of batch size helps the DL novice understand)
In all RNN, including Simplernn, LSTM, GRU, and so on, the input and output data formats are as follows:
The input is a three-dimensional vector. Samples is the number of data bars. It is difficult to understand Timesteps and Input_dim. Input_dim is the dimension of the representation of the data, and Timestep is the total number of time steps. For example such a data, a total of 100 sentences, 20 words per sentence, each word is represented by a 80-dimensional vector. In RNN, each timestep input is a word (not necessarily, you can also tune to two words or other), from the first picture of the RNN, T0 moment is the first time step, x0 is to represent the first word in a sentence 80-dimensional vector, T1 is the second time step, X1 represents a 80-dimensional vector of the second word in a sentence ... So the size of the input data should be (100, 20, 80)
Note: The sentence length in practice is not exactly the same, but judging from the RNN workflow, it can handle variable-length sequences. In Kera, you can first set the sentence to the maximum length, less than the length of the sentence up to 0, and then the RNN layer before adding embedding layer or mask layer to filter out the top-up characters. Specifically in my blog post.
Http://www.cnblogs.com/leeshum/p/6089286.html
Not to be continued. (moved to the brick)
Lstm combing, understanding, and Keras realization (i)