This article content and picture Main reference: Understanding Lstm Networks lstm Core thought
Lstm was first proposed by Hochreiter & Schmidhuber in 1997, designed to address long-term dependency problems in neural networks, and to remember that long-term information is the default behavior of neural networks, rather than requiring great effort to learn. LSTM Memory Unit
The following is an understanding of the various parts of the LSTM unit:
The key to the LSTM is the cell state, the horizontal line from left to right above the lstm unit in the diagram, which, like a conveyor belt, passes information from the previous cell to the next unit, with little linear interaction with the other parts.
LSTM controls the ability to discard or add information through gate to achieve forgetfulness or memory. "Door" is a kind of structure that makes information selectively pass through, which consists of a sigmoid function and a point multiplication operation. The output value of the sigmoid function is in the [0,1] interval, 0 is completely discarded, and the 1 represents the full pass. A LSTM unit has three such doors, namely, The Forgotten Gate (Forget gate), the input gate (input gate) and the output gate.
The Forgotten Gate (Forget gate): The forgotten Gate is the output ht−1 of the above unit and the input XT of this unit is the input sigmoid function, which produces a value in [0,1] for each item in ct−1 to control the degree of the last unit state being forgotten.
Input gate: Input gate and a tanh function cooperate to control what new information is added. The Tanh function produces a new candidate vector ct~, in which each item in the gate ct~ produces a value within [0,1], controlling how much new information is added. At this point, we have the output of the forgotten gate ft, to control the extent of the last unit has been forgotten, as well as the output of the input gate it, to control the number of new information to be added, we can update the unit state of the memory unit, ct=ft∗ct−1+it∗ct~.
Output gate: The output gate is used to control how much of the current unit state is filtered out. The unit state is activated first, and the output gate generates a value in [0,1] for each of the items, and the control unit state is filtered.
lstm var.
The lstm described above is a standard version, and not all lstm are exactly the same as described above. In fact, there seems to be a slight difference in the lstm of each essay.
A more popular lstm variant, shown in the following figure, was first proposed by Gers & Schmidhuber in 2000. This method increases the "peephole connections", where each door can be "spied" into the unit state. Here, the forgotten gate and the input gate are connected with the state of the previous unit, and the output gate is connected with the current cell state.
There's a variant that cancels the input gate, how much of the new information is added to the old state is set to a complementary two value (and 1), that is, we forget it only when we need to add new information, and we only add new information when we need to forget.
Another noteworthy variant looks fun, called gated Recurrent unit (GRU), first presented by Cho, et al. in 2014. This method connects the forgotten door and the input gate into a "Renewal gate" (Update Gate), also incorporates the hidden state HT and the unit state Ct, and the final result is simpler than the standard lstm.
Of course, there are a lot of variants that are not listed here. Some have specifically compared the LSTM variants and compared their results to show that these variants are similar in detail to Greff, et al. (2015) and Jozefowicz, et al. (2015).