Lstm's deep understanding

Source: Internet
Author: User

LSTM is the most important is the understanding of the cell, the first to see this classic blog, after reading the feeling of each division have read, but the overall integration is not up, and then saw the great God wrote a summary of the blog, the whole LSTM structure integrated.

1,lstm cell most common structure diagram:

Note: Lstm can be understood as having three doors, a cell. The input amplitude of the input Gate control input (new memory), the input amplitude of the memory state before the forgotten Gate is controlled, and the output amplitude of the final memory.

2, I think the most help I understand the whole process is the following picture:

T-moment cell input:
1. With current input XT
2, the first time the cell output ht-1
3. The state of the cell in the previous moment ct-1 ( can be understood as the intermediate value in the calculation of the ht-1 process )

T-moment Cell 3 gate gate, range [0,1] (improved GRU cell merges input and forgotten doors into the update gate):
1. Input Door It
2. Forgotten Door ft
3. Output door OT

The calculation process in the cell is as follows (in contrast to the second chart above):
Step 1.1 Input Door It "corresponds to the first formula"
Step 1.2 and its control of new memory ct wavy lines:
W is its corresponding weight matrix and B is biased. The yellow box is a different activation function. In fact, these two operations can be equivalent to a two-layer parallel neural network. "Note here that there is an activation function for Tanh, and then by the cell and the output gate to get the hidden layer output is also used to tanh, the other input, forgetting, the output gate is sigmoid. 】

Forgotten door Specific calculation process:
Step 1.3, Forgetting gate ft (controls the degree of forgetfulness of previous input memory Ct-1) (pictured below)
among them, step1.1, 1.2, 1.3 can be computed in parallel, input is the current input XT and the previous time the cell output ht-1

After the step of 1.1, 1.2, 1.3 to get the current cell state CT, followed by STEP2.

Update the calculation process for the gate (updated):

The update gate is added by adding two parts, the first part of which is the "forget old Thing", which is multiplied by the cell state ct-1 of the forgotten gate ft and the previous time, and the latter part is the it that is obtained by the input gate, and the new memory wave Line ct that controls it, multiplying them by two. "Add new Thing ".

Step3: Output gate OT and its control of the T-moment cell's hidden layer output ht.

The STEP4 Signal XT passes the final output of HT:

from here I infer that the output gate output is not the result of the final lstm layer output, the result of the output of the LSTM layer is obtained by multiplying the weight matrix of the HT (hidden layer output) and the output gate through the sigmoid layer.

Reference Address: https://blog.csdn.net/songhk0209/article/details/71134698

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.