Recurrent neural Networks
In traditional neural networks, the model does not focus on the processing of the last moment, what information can be used for the next moment, and each time will only focus on the current moment of processing. For example, we want to classify the events that occur at every moment in a movie, and if we know the event information in front of the movie, then it is very easy to classify the events at the current moment. In fact, the traditional neural network does not have the memory function, so it does not use the information that the movie already appeared when it classifies the events that appear at each moment, so what method can let the neural network remember this information. The answer is recurrent neural Networks (Rnns) recursive neural network.
The result of a recurrent neural network is somewhat different from a traditional neural network, with a ring pointing to itself, which indicates that it can pass the information currently being processed to the next moment, with the following structure:
The Xt x_t is input, A is the model processing part, and HT h_t is the output.
To make it easier to describe recurrent neural networks, we expand the diagram above to get:
Such a chain-like neural network represents a recurrent neural network, which can be considered as multiple replication of the same neural network, each time the neural network transmits information to the next moment. How to understand it. Assuming there is a language model, we want to predict what the current word is based on the words that have appeared in the sentence, and the recursive neural network works as follows:
Where W is a variety of weights, x for input, y for output, and h for the hidden layer processing state.
Recursive neural network can be used to solve many problems, such as speech recognition, language model, machine translation and so on, because of its memory function. But it does not deal well with long-time dependencies.
long-time dependency issues
Long-time dependence is such a problem that it is difficult to learn the relevant information when the prediction point is far from the dependent information. For example, in the sentence "I was born in France, ..., I can speak French", to predict the end of "French", we need to use the context "France". In theory, recursive neural networks can deal with such problems, but in fact, conventional recurrent neural networks do not solve long-time dependencies well, and good LSTMS can solve this problem well.
LSTM Neural Network
Long Short term mermory network (LSTM) is a special kind of rnns that can be used to solve the problem of longer-time dependencies well. So how does it differ from conventional neural networks?
First, let's look at the specific structure of Rnns:
All recurrent neural networks are a chain of repetitive neural network modules, and it is possible to see that the processing layer is very simple, usually a single tanh layer, and the current output is obtained through the current input and the output of the previous moment. Compared with a neural network, it has been easily modified to take advantage of the information learned in the last moment to learn at the current moment.
The structure of the lstm is similar to the above, and the difference is that its repeating module is a bit more complex, with a four-layer structure:
Among them, the processing layer appears the symbol and the expression meaning as follows:
The core idea of Lstms
The key to understanding LSTMS is the following rectangular box, called the memory block, which consists of three gates (forget gate, input gate, output Gate) and a memory unit (cell). The horizontal line above the box, called the cell state, is like a conveyor belt that controls the flow of information to the next moment.
This rectangular box can also be expressed as:
These two maps can be seen, the center of the image below the CT c_t cell, the input from below (Ht−1,xt h_{t-1},x_t) to the output HT h_t a line that is the cell state, Ft,it,ot f_t,i_t,o_t for the forgotten door, input door, output out , expressed in sigmoid layer. The two tanh layers in the figure above correspond to the input and output of the cell respectively.
The lstm can be used to add and remove information to the cell via a gated unit. Through the door can selectively decide whether the information is passed, it has a sigmoid neural network layer and a paired multiplication operation, as follows:
The output of this layer is a number from 0 to 1, which indicates how much information is allowed to pass, 0 means that it is not allowed at all, and 1 means that it is allowed to pass completely.
stepwise parsing lstm
Lstm The first step is to determine what information can be passed through the cell state. This decision is controlled by the "Forget gate" layer through sigmoid, which generates a 0 to 1 ft x_t value based on the output of the previous moment ht−1 h_{t-1} and the current input XT f_t to determine whether to let the information learned in the last moment ct−1 c_{t-1} Passed or partially passed. As follows:
For example, we learned a lot in the previous sentences, something that is useless for the present and can be selectively filtered.
The second step is to generate new information that we need to update. This step consists of two parts, the first being an "input gate" layer that determines which values are to be updated by Sigmoid, and the second is a tanh layer used to generate a new candidate value C~t \tilde{c}_t, which may be added to the cell state as a candidate for the current layer. We will combine the values generated by these two parts to update them.
Now that we've updated the old cell state, first we multiply the old cell state by the FT f_t to forget the information we don't need, and then add it to the it∗c~t i_t*\tilde{c}_t to get the candidate values.
One or two steps together is the process of throwing away unwanted information and adding new information:
As an example, in the previous sentence