Preliminary study on Lstm

Source: Internet
Author: User

To say lstm, you have to start with RNN. RNN is a tool for modeling sequential data, which is used in the fields of speech recognition, machine translation and so on. Lstm can be said to be an improved version of RNN, in short, rnn the long-range sequence processing will appear gradient disappear or explode this phenomenon, make training not up (note: Gradient disappears or explosion problem not only in the RNN, in other neural networks also, For example, it may occur when using Sigmoid's activation function, except that the method can be used to replace the activation function, such as Relu. RNN (Recurrent neural Network)

Human beings are not going to think of a problem or understand an article from scratch at every moment. As you read an article, every word you understand is based on a previously read word. You won't forget everything you've read before, and you won't start from scratch, which means that some of your ideas have been saved. The idea of RNN is to remember what has been seen before, to form memories, and to apply memory to the processing of current samples, suggesting that the current data received by the hidden layer neurons are not only from the input x x but also from previous memory experiences.

Traditional neural networks are not able to deal with similar sequences of data problems, which is also a very large short board. Imagine, for example, that you want to judge what happens at every point in the movie. Traditional neural networks are not able to apply the previous time to the predictions of current events.

But RNN solves this problem, and RNN has a rnn cell that is used to persist information.

X is the input, A is RNN cell,h is the output of the RNN Cell, A has a loop flag, a allows the information loop to be passed from the previous step to the next.

It is not understandable to look at the cycle a above, but it is clear that it will unfold.

X0,x1,x2 ... is a series of sequential input data, each time the RNN cell passes the previously saved state information to the next, in short, the previous memory is passed to the next step.

In recent years, RNN has been successfully applied to speech recognition, language modeling, translation, image labeling (images captioning), and so on, can refer to the second blog I recommend.

The specific success of the application is lstm, it is an upgraded version of RNN, it better deal with long-sequence dependency problem, he solved the problem of gradient disappear. LSTM (Long short term Memories) long sequence dependency problem

RNN Excellent processing sequence problem is that it connects the previous information for the current task, such as using the previous movie screen information to prompt understanding of the current movie picture, RNN can do this, and very good.

Sometimes we just need some information that's closer to the current information to handle the current task. For example, we want to predict the next word of a sentence, such as ' The Clouds is in the xxx ', we need to predict the word sky, we do not need to be far away from xxx words, from xxx very close to the word memory is enough. In this case, we need information memory and the current task is very close, rnn can be very good to use the previous information.

But there are situations where we need more distant information to help deal with the current problem. For example, I'm going to predict a word ' I grew in France,........ I speak fluent XXX (French) ' Here is to predict french,rnn to predict where French is going to help it through a far-off contextual message, which makes it too far to deal with the information currently involved.

Unfortunately for long-distance contextual information, RNN cannot learn to link the information between the two.

In theory, RNN has the absolute ability to handle long-distance dependence. Unfortunately, the fact that RNN doesn't work. The but,lstm solved the problem. LSTM Networks

LSTM is specifically designed to solve long-distance dependency problems. Remembering long-distance messages is their default ability.

All RNN networks have a repeating module, and a simple module is shown in the following figure:

LSTM also have similar modules, but the repeating modules are a bit different. Unlike a single neuron, LSTM has four, interacting in a particular way.

the core of lstm

The core of the LSTM is the cell state, which is a straight line across the graph, as shown in the following figure.

LSTM has the ability to remove or add information in the cell state, controlled by a structure called a gate.

The gate is a kind of structure which can control the flow of information selectively. They consist of a sigmoid nerve layer and a point multiplication operation.

The sigmoid layer outputs a number between 0-1 to describe the flow of information. A value of 0 means that no information is allowed to pass, and 1 means that all information is circulated.

A LSTM has three gates to control the cell state. a step -by-step understanding of LSTM

The first step in lstm is to decide what information in the cell state to discard. This is determined by a part of the sigmoid layer called the forgotten door. Forgotten gate View ht−1 h_{t-1} and XT X_{t} and output a number between 0-1 for each number of cell state ct−1 c_{t-1}. 1 represents all reservations, while 0 represents all discards.

When we return to our language model example, let's try to predict the next word based on the previous word. In such a problem, the cell state may contain properties of a topic (such as a singular plural), so we can use the correct verb format. When we see a new theme, we want to forget about the properties of the old theme.

The next step is to decide what new information is stored in the cell state. This is divided into two steps. In the first step, a part of the sigmoid layer is called the input gate to determine what information will be updated. Next, a tanh tanh layer is created that wants to add information, which will be entered into the cell state. Next, we're going to combine these two parts to create and update the CELLL state.

In the language modeling example, we want to add the properties of the new theme to replace the properties of the old theme we forgot.

The old cell state is now updated to get the new cell state.
We multiply the FT f_{t} by the old state, forgetting the information we decided to forget. Then we add. This is the new information you want to join.

Ultimately, we need to decide what kind of output. The output is based on the cell state, but it will be a filtered version. First, we run a sigmoid layer to decide which part of the cell state we want to output. We then pass the cell state through the Tanh Tanh (which makes the value 1 to 1) and multiply the output of the sigmoid gate, finally getting the part we want to output.
a variant of lstm.

Here are some of the better blogs I've read: https://r2rt.com/written-memories-understanding-deriving-and-extending-the-lstm.html,http:// karpathy.github.io/2015/05/21/rnn-effectiveness/,http://colah.github.io/posts/2015-08-understanding-lstms/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.