Recurrent neural Network study note "Two" rnn-lstm

Last Update:2015-09-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Theoretically, as long as the RNN structure is large enough to generate arbitrarily complex sequence structures.

But in fact, the standard RNN is not effective long-term preservation of information (this is due to the HMM structure, each time the information of each node is always the same transformation, then either the exponential explosion or exponential decay, the information will be lost soon). It is also due to its "forgetfulness" characteristic that the sequence generated by this rnn is prone to lack of stability. In this case, if you can only rely on the results of the last few steps to predict the next step, and use the new predicted results to predict the next step, then once the error, the system will be able to go down in the wrong direction, and there is little opportunity to correct the error from the previous information.

From this perspective, if the RNN can have a "long-term memory", it will have a very good stability, because even if it is not sure that the current steps are not correct, it can also get some "inspiration" from earlier information to form new predictions.

(one might say that if you are training RNN, you can add noise and other methods to keep it stable when encountering strange inputs.) But we still feel that the introduction of better memory methods is more efficient and long-term development of the move. ）

LSTM

Lstm refers to long short-term Memory. This is a structure that was developed in the 1997.

Probably.

The design of this structure is very delicate, including the input gate, the forgetting gate and the output gate. These three types of gates are controlled by specific data, with a value of {0, 1} (which is actually approximated by a sigmoid or tanh function, because discrete 0 and 1 are not derivative.) ）。 The input gate, the output gate, the forgetting gate and the cell are the same {0,1} vectors as the HT size.

For example:

The input gate is 0, the forgetting gate is 1, the output gate is 1 when the LSTM unit will not accept this input data and return the last recorded data again. (similar to read-only)

The input gate is 1, the forgotten Gate is 0, the output gate is 1 when the LSTM unit will empty the previous "memory", only the information from the XT to HT, while recording down. (similar to refresh)

The input gate is 1, the Forgotten Gate is 1, the output gate is 0 when the LSTM unit will add this input information to the memory but will not continue to pass. (similar to storage)

Wait a minute...

If it's not clear enough, it would be better to look at the transfer formula between them.

(where σ (x) represents the sigmoid function)

The W matrix is diagonal array , which means that each gate element is obtained by the corresponding dimension data, that is, non-interference!

The original LSTM algorithm has a custom-made degradation approximate method so that the weight of the network can be updated at every point in time, but the entire degradation can be done by the reverse propagation of time, which is also used here method. However, there is a problem to be solved that some derivatives become very large, resulting in a difficult calculation. To prevent this from happening, all of the experiments in this article stop the derivation of the input to the entire network in the LSTM layer (before sigmoid and Tanh use). (I'll talk about it later)

Recurrent neural Network study note "Two" rnn-lstm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Recurrent neural Network study note "Two" rnn-lstm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Recurrent neural Network study note "Two" rnn-lstm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support