Cyclic neural networks (recurrent neural network,rnn)

Source: Internet
Author: User

Why use sequence models (sequence model)? There are two problems with the standard fully connected neural network (fully connected neural network) processing sequence: 1) The input and output layer lengths of the fully connected neural network are fixed, and the input and output of different sequences may have different lengths, Selecting the maximum length and filling the short sequence (PAD) is not a good way; 2) all connected neural networks have no connection between the nodes of the same layer, and when the information of the time before the sequence is needed, the fully connected neural network cannot be used, and the features cannot be shared between the different positions of a sequence. And the cyclic neural network (recurrent neural network,rnn) can solve the problem well.

1. Serial Data

Before introducing a recurrent neural network, let's look at some sequence data:

Figure 1: Sequence data

The input $x $ and the output $y $ may be sequences or only one is a sequence, and the length of the input $x $ and the output $y $ two sequence may or may not be equal. The following describes the loop neural network structure, the input $x $ and the output $y $ two for the same length of the sequence. (The following $h $ and the resulting output $y $ are considered to be different in this article)

2. Structure of the cyclic neural network

Figure 2 shows a classic recurrent neural network (RNN): (This picture is really confusing for the first time)

Figure 2: Classical structure of recurrent neural networks

For RNN, a very important concept is the moment. RNN will give an output for each moment of input combined with the state of the current model. In Figure 2, $t $ time RNN The input of the principal structure a in addition to the input layer $X _t$, there is a loop edges provides a hidden state that is passed from the $t -1$ moment. (This section does not understand OK, see below)

RNN can be thought of as the result of the same neural network structure being replicated in time series. Figure 3 shows an expanded RNN. (Expand the diagram is relatively good understanding)

Figure 3: Cyclic neural network expansion by time

From the RNN, it is easy to conclude that the problem it is best at solving is related to the time series. RNN is also the most natural neural network structure for dealing with such problems.

The principal structure of a RNN is duplicated several times by the time series, and structure A is also called the loop body. How to design the network structure of loop body A is the key to solve the practical problem of RNN. Similar to parameter sharing in convolutional neural network (CNN) filters, in RNN, the parameters in the loop body A are shared at different times.

Figure 4 shows the simplest use of a single fully-connected layer as the RNN of the loop body A, the yellow tanh small box in the figure represents a full-join layer that uses tanh as the activation function.

Figure 4: RNN structure diagram using a single-layer fully-connected neural network as a cyclic body

Figure 5: The meanings of the various symbols represented in Figure 4

(Note: pointwise operation does not appear in Figure 4)

$t $ time the input of the loop body a includes the $X _t$ and the hidden state that is passed from the $t -1$ moment $h _{t-1}$ (according to the copy flag in Figure 5, the arrow that $t -1$ and the $t $ time loop body a connects to the hidden state $h _{t-1}$ pass). How does the two-part input of the loop body a handle? According to Figure 5, the $X _t$ and $h _{t-1}$ are directly spliced together to become a larger matrix/vector $[x_t, h_{t-1}]$. Assuming that the shapes of the $X _t$ and $h _{t-1}$ are [1, 3] and [1, 4] respectively, the shape of the input vectors of the full join layer in the last loop body A is [1, 7]. After splicing, follow the way of the whole connection layer to process.

In order to convert the implied state of the current moment $h _t$ into the final output $y _t$, the cyclic neural network needs another fully connected layer to complete the process. This is the same as the final fully connected layer in convolutional neural networks. (If you do not consider the RNN output also requires an all-connected layer, then $h _t$ and the value of the $y _t$ are the same)

The forward propagation calculation process for RNN is as follows:

The forward propagation calculation process of Figure 6:RNN

Figure 6 clearly shows us the specific calculation flow in RNN, and the current hidden state $h _t$ into the final output $y _t$ process.

3. Types of recurrent neural networks

Type of Figure 7:RNN

(1) One to one: in fact, there is no difference between a fully connected neural network and this category is not RNN.

(2) One to many: the input is not a sequence and the output is a sequence.

(3) Many to one: the input is a sequence and the output is not a sequence.

(4) Many to many: both input and output are sequences, but they can be of different lengths.

(5) Many to many: both the output and the output are sequences, the same length.

4. Limitations of basic cyclic neural networks

The above pictures are all one-way RNN, the one-way RNN has a drawback is that in the $t $ time, can not use the $t +1$ and subsequent time sequence information, so there is a bidirectional cyclic neural network (bidirectional RNN).

"It is important to point out that theoretically cyclic neural networks can support sequences of any length, but in practice, if the sequence is too long it can lead to gradient dissipation in optimization (the Vanishing gradient problem), so in practice a maximum length is generally specified, The sequence is truncated when the sequence length exceeds the specified length. ”

One of the technical challenges faced by RNN is the long-term dependency (long-term dependencies) problem, which is that the current moment is unable to obtain the required information from the time of the larger interval in the sequence. In theory, RNN can deal with long-term dependency problems, but in the actual process, RNN does not perform well.

But GRU and LSTM can deal with gradient dissipation problems and long-term dependencies.

5. Gated Circulation Unit (Gated recurrent unit,gru) and Long short Memory network

The difference between the base Rnn,gru and LSTM is the network structure of the loop body A.

GRU and LSTM both introduced the concept of a gate (gate). The GRU has two "doors" ("Update Gate" and "Reset Gate"), while LSTM has three "doors" ("Gate", "Input gate" and "Output Gate").

Figure 8:lstm

Figure 9:gru

GRU and LSTM rely on the structure of some "gates" to selectively influence the state of each moment in a recurrent neural network. The so-called "gate" structure is an all-connection layer using sigmoid and a bitwise multiplication operation, the two operations together is a "gate" structure, 9 shows.

Figure 9: "Gate" structure

The "gate" structure is called because the fully connected neural network layer using sigmoid as the activation function outputs a value between 0 and 1, describing how much information the current input can pass through this structure. So the function of this structure is similar to a door, when the door opens (the sigmoid full connection layer output is 1 o'clock), all information can pass; when the door closes (the sigmoid neural network layer outputs 0 o'clock), no information is passed.

The LSTM has three doors, namely "Forgotten Gate" (Forget gate), "Input gate" and "Output gate". The role of the "Forgotten Gate" is to let the recurrent neural network "forget" the information that was not used before. The "Input Gate" determines what information enters the current state of the moment. Through the "Forgotten Gate" and "Input Gate", the LSTM structure can effectively determine what information should be forgotten and what information should be retained. LSTM The current moment state $C _t$, the output of the current moment needs to be generated, which is done through the "Output gate".

GRU's two doors: one is the "Update Gate", which merges LSTM's "Forgotten Gate" and "Input gate" into a "gate" structure, and the other is the "Reset Gate". "Visually, the ' reset door ' determines how the new input information is combined with the previous memory, and the ' Update Gate ' defines the amount of memory saved to the current moment. The units that learn to capture short-term dependencies tend to activate ' reset Doors ', and those that capture long-term dependencies will often activate ' update Gates '. ”

For more LSTM and GRU, refer to the understanding LSTM Networks and Machine Heart GitHub project: Explore the mysteries of sequence modeling from loops to convolution.

(Note: The letter that appears in the text $h $ represents the hidden state, $C $ represents the cell state. )

References

Understanding LSTM Networks

The unreasonable effectiveness of recurrent neural Networks

Course 5 Sequence Models by Andrew Ng

The heart of the Machine GitHub project: from Loop to convolution, explore the mysteries of sequence modeling

"TensorFlow Google deep Learning framework"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.