Main reference: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
RNN (recurrent neuralnetworks, cyclic neural network)
For a common neural network, the previous information does not have an impact on the current understanding, for example, reading an article, we need to use the vocabulary learned before, and the ordinary neural network does not do this, so there is a circular neural network, its greatest advantage is the retention of information before.
XT for input, pass function A, output HT, and after this process, modify function A so that it contains the input information of this time. Can be understood as the following model
LSTM:
Background: Rnn No doubt solves the memory problem of neural network, RNN in short-term memory, such as a sentence, infer its next word is what, only need a little learning process, can use the recent information to infer, but when this sentence into a paragraph, For circular neural networks, it may not be accurately inferred. In theory, RNN can establish a long period of contact, but in practice, it may be difficult to learn the information long before.
Lstm is a special rnn that can learn a long time ago, presented in 1997, and has been widely used. His proposal is to solve the problem of long time dependence.
The difference between RNN and lstm can be seen from the above two images, and in the second module RNN has only one Tanh gate, and Lstm has four gates. It can be updated for each state.
The following is an analysis of the specific effects of these four gates and their expressions:
1. The first lstm algorithm to determine what information is needed to be thrown away. Mainly through the forgotten door layer (Forget gate layer) to achieve. By processing the input ht-1 and the XT, the state outputs a value between the 0~1 to indicate the degree of forgetfulness, where 0 indicates total forgetfulness, and 1 indicates that all is remembered.
The second step in the 2.LSTM algorithm is to determine the new information stored. First, the input gate layer determines which values need to be updated, and the Tanh layer generates new vectors, which together update the state at this time. two parts
3. Update the status Ct-1 to CT, multiplying with ft and Ct-1, indicating the degree to which the decision was forgotten, and it multiplied with c~t to represent the newly added information, thus getting the information for the next moment.
4. Determine the output, the output is based on our analysis above, but also add filtering, first run a SIGMOD layer, to determine that part of the state we are going to output, and then, the state into the tanh, so that its value between -1~1, and then the output and sigmoid gate output by multiplying, We'll get the output part of our decision.
Some variants of the LSTM algorithm:
1. Add peep hole connection, that is, let each door layer monitor the state of the system. Some of the deformation is all add, some part plus.
2. Use of connected forgotten doors and input gates. Prior to the decision to forget the information and the need to add new information, the improvement is to let these two combine, only when there is substitution input to this part of the forgotten, only when the forgotten some old things before the new input.
3. Add the gated loop unit, combining the forgotten door and the input door as the update door, but also fusing the current state and the hidden state.