Deep understanding of lstm neural Network

Source: Internet
Author: User

This article content and picture Main reference: Understanding Lstm Networks lstm Core thought

Lstm was first proposed by Hochreiter & Schmidhuber in 1997, designed to address long-term dependency problems in neural networks, and to remember that long-term information is the default behavior of neural networks, rather than requiring great effort to learn. LSTM Memory Unit

The following is an understanding of the various parts of the LSTM unit:

The key to the LSTM is the cell state, the horizontal line from left to right above the lstm unit in the diagram, which, like a conveyor belt, passes information from the previous cell to the next unit, with little linear interaction with the other parts.

LSTM controls the ability to discard or add information through gate to achieve forgetfulness or memory. "Door" is a kind of structure that makes information selectively pass through, which consists of a sigmoid function and a point multiplication operation. The output value of the sigmoid function is in the [0,1] interval, 0 is completely discarded, and the 1 represents the full pass. A LSTM unit has three such doors, namely, The Forgotten Gate (Forget gate), the input gate (input gate) and the output gate.

The Forgotten Gate (Forget gate): The forgotten Gate is the output ht−1 of the above unit and the input XT of this unit is the input sigmoid function, which produces a value in [0,1] for each item in ct−1 to control the degree of the last unit state being forgotten.
Input gate: Input gate and a tanh function cooperate to control what new information is added. The Tanh function produces a new candidate vector ct~, in which each item in the gate ct~ produces a value within [0,1], controlling how much new information is added. At this point, we have the output of the forgotten gate ft, to control the extent of the last unit has been forgotten, as well as the output of the input gate it, to control the number of new information to be added, we can update the unit state of the memory unit, ct=ft∗ct−1+it∗ct~.
Output gate: The output gate is used to control how much of the current unit state is filtered out. The unit state is activated first, and the output gate generates a value in [0,1] for each of the items, and the control unit state is filtered.
lstm var.

The lstm described above is a standard version, and not all lstm are exactly the same as described above. In fact, there seems to be a slight difference in the lstm of each essay.

A more popular lstm variant, shown in the following figure, was first proposed by Gers & Schmidhuber in 2000. This method increases the "peephole connections", where each door can be "spied" into the unit state. Here, the forgotten gate and the input gate are connected with the state of the previous unit, and the output gate is connected with the current cell state.

There's a variant that cancels the input gate, how much of the new information is added to the old state is set to a complementary two value (and 1), that is, we forget it only when we need to add new information, and we only add new information when we need to forget.

Another noteworthy variant looks fun, called gated Recurrent unit (GRU), first presented by Cho, et al. in 2014. This method connects the forgotten door and the input gate into a "Renewal gate" (Update Gate), also incorporates the hidden state HT and the unit state Ct, and the final result is simpler than the standard lstm.

Of course, there are a lot of variants that are not listed here. Some have specifically compared the LSTM variants and compared their results to show that these variants are similar in detail to Greff, et al. (2015) and Jozefowicz, et al. (2015).

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.