Understanding Lstm Networks

Last Update:2018-07-17 Source: Internet

Author: User

Tags neural net

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recurrent neural Networks

Humans don ' t start their thinking from scratch every second. As you read this essay and you understand each word based on your understanding of previous words. You don ' t throw everything away and start thinking from scratch again. Your thoughts have persistence.

Traditional neural networks can ' t do this, and it seems like a major shortcoming. For example, imagine your want to classify what kind of the event is happening at every point in a movie. It ' s unclear how a traditional neural network could with its reasoning about previous events in the film to inform later on Es.

Recurrent neural networks address this issue. They are networks with loops in them, allowing information to persist. Recurrent neural Networks have loops.

In the above diagram, a chunk of neural network, AA, looks at some input xtxt and outputs a value htht. A Loop allows information to is passed from one step, the network to the next.

These loops make recurrent neural networks seem kind of mysterious. However, if you are a bit more, it turns out of that they aren ' t all that different than a normal neural network. A recurrent neural network can is thought of as multiple copies of the same network, each passing a message to a successor . Consider what happens if we unroll the Loop:an unrolled, recurrent neural.

This is chain-like nature reveals that recurrent neural networks are intimately to related and sequences. They ' re the natural architecture of neural network to use for such data.

And they certainly are used! In the last few years, there have been incredible success applying rnns to a variety of problems:speech recognition, Lang Uage modeling, translation, Image captioning ... The list goes on. I ' ll leave discussion of the amazing feats one can achieve with Rnns to Andrej Karpathy ' s excellent blog post, the Unreaso nable effectiveness of recurrent neural Networks. But They really are pretty amazing.

Essential to this successes is the "Lstms," a very special kind of recurrent neural network-which works, for many Tasks, much too much better than the standard version. Almost all exciting results based on recurrent neural networks are achieved with them. It ' s These Lstms which this essay'll explore. The Problem of long-term dependencies

One of the appeals of Rnns is the idea this they might be able to connect previous information to the present task, such a s using previous video frames might inform the understanding of the present frame. If Rnns could do this, they ' d to be extremely useful. But can they? It depends.

Sometimes, we only need to look at recent information to perform the present task. For example, consider a language model trying to predict the next word based on the previous ones. If we are trying to predict the last word in "the clouds are in the sky," we don ' t need any further context–it ' s pretty Obvious the next word is going to be sky. In such cases, where the gap between the relevant information and "place" it's needed is small, rnns can learn to u Se the past information.

But There are also cases where we need more context. Consider trying to predict, the last word in the text I grew up in France ... I speak fluent French. " Recent information suggests that's next word is probably the name of a language, but if we want to narrow down which LAN Guage, we need the context of France, from further. It ' s entirely possible for the gap between the relevant information and the point where it's needed to become very.

Unfortunately, as that gap grows, Rnns become unable to learn to connect the information.

In theory, Rnns are absolutely capable of handling such "long-term dependencies." A Human could carefully pick parameters for them to solve toy the of this form. Sadly, in practice, Rnns don ' t seem to is able to learn. The problem was explored into depth by Hochreiter (1991) [German] and Bengio, et al. (1994), who found some-pretty Al reasons why it might be difficult.

Thankfully, Lstms don ' t have this problem! lstm Networks

Long Short Term Memory networks–usually just called "Lstms" –are a special kind of RNN, capable of learning long-term D Ependencies. They were introduced by Hochreiter & Schmidhuber (1997), and were refined and popularized through many people in following Work.1 They work tremendously a large variety of problems, and are now widely used.

Lstms are explicitly designed to avoid the long-term dependency. Remembering information for long periods of the time are practically their default behavior, not something they to Lea Rn!

All recurrent neural networks have the form of a chain of repeating modules of neural network. In standard Rnns, this repeating module would have a very simple structure, such as a single tanh layer. The repeating module in a standard RNN contains a single layer.

Lstms also have this chain like structure, but the repeating module has a different structure. Instead of has a single neural network layer, there are, four interacting very in a special. The repeating module in a lstm contains four interacting layers.

Don ' t worry about the details of what ' s going on. We ' ll walk through the Lstm diagram step by step later. For now, let's just try to get comfortable with the notation we'll be using.

In the above diagram, the carries a entire vector, from the output of one node to the inputs of others. The pink circles represent pointwise operations, like vector addition, while the yellow boxes are learned neural network l Ayers. Lines merging denote concatenation, while a line forking denote it content being copied and the copies going to different Locations. The Core idea behind Lstms

The key to Lstms is the "cell state", the horizontal line running through the top of the diagram.

The cell is kind to like a conveyor belt. It runs straight down the entire chain, with a only some minor, linear. It ' s very easy for information to just flow along it unchanged.

The lstm does have the ability to remove or add information to the cell State, carefully regulated by structures called GA Tes.

Gates are a way to optionally let information through. They are composed out of a sigmoid neural net layer and a pointwise multiplication.

The sigmoid layer outputs numbers between zero and one, describing how much of each component should is let through. A value of zero means "let no through," while a value of one means "let everything through!"

An lstm has three of all gates, to protect and control the cell state. step-by-step lstm Walk through

The "the" the "the" the "the" the "decide" what information we ' re going to throw. This decision was made by a sigmoid layer called the "Forget Gate layer." It looks at ht−1ht−1 and Xtxt, and outputs a number between 0

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More