A course of recurrent neural Network (1)-RNN Introduction _RNN

Source: Internet
Author: User
Tags theano
A course of recurrent neural Network (1)-RNN Introduction

source:http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/

As a popular model, recurrent neural Network (Rnns) has shown great application prospect in NLP. Despite the recent popularity, there are few resources to explain the Rnns principle and how to implement it fully. So there's this tutorial. This tutorial will contain the following four sections: Rnns Introduction to use Python and Theano to implement RNN understanding BPTT algorithm and gradient extinction problem GRU and lstm implementation

As part of this tutorial, we will implement a language model based on RNN. The application of this language model is twofold: first, using it we can score arbitrary sentences based on realistic possibilities. This can also be used to measure the correctness of sentence syntax and semantics. Such models are often part of the machine translation system. 2nd, use this language model to generate new text (this application will be cooler). For example, we use the Shakespeare training model to produce texts similar to Shakespeare's. Andrej Karpathy's Blog gives a number of tasks that can be accomplished based on the Rnns character-level language model.

This assumes that you have a certain understanding of the Neural Network Foundation. If not, you can read the blog I wrote, this blog gives the principle and implementation of the non recursive network. What Rnns is.

The principle behind Rnns is to use sequence information. In traditional neural networks, we assume that all inputs and outputs are independent of each other. But for a lot of problems, this is a very bad idea. If you want to predict the first word in a sentence, you'd better know the word before it. Rnns is called a recurrent neural network because it performs the same task for each node in a sequence, and the output depends on the previous calculation. Another way to understand Rnns is to think that Rnns has a "memory" that captures the information that has been calculated. Theoretically, Rnns can take advantage of the information in any long sequence, but in fact they can only take advantage of the very few steps in front of it (explained later). The classic Rnns is shown below:


Recursive neural network and its forward calculation diagram (Source: Nature) in time step

The above diagram expands the Rnns into a full network. By unfolding, we get a full sequence of networks. For example, if we study a sentence sequence of 5 words, the network will be expanded into a 5-layer neural network, each layer corresponding to a word. The formula in the Rnns calculation diagram is as follows: The XT is entered in the T-step. For example, X1 corresponds to the one-hot word vector of the second word in a sentence sequence. ST is the hidden layer state in the T step. It is the "memory" of the network. ST is computed by the implicit layer state of the previous step and the input of the current step: st=f (uxt+wst−1). function f is usually non-linear, such as Tanh and Relu. In order to compute the implied layer state of the first step, you must know s−1, which is usually initialized to 0. OT is the output of the T-step. For example, if we want to predict the next word in a sentence, it will be a probability vector on the glossary: Ot=softmax (Vst).

There are a few things to note: You can see St as a memory of the web. The St captures all the computational information in the previous step. The output OT calculation only utilizes the memory in the T-step. As mentioned earlier, dealing with St is actually a bit more complicated because it usually does not capture the information of many previous time steps. Traditional deep networks have different parameters for each layer, but Rnns share the same parameters (U,v,w) on all time steps. This means that we do the same task every step of the way, just the input is different. This also greatly reduces the total number of parameters that we need to learn. There is output for each time step in the diagram above, but for some tasks this may not be necessary. For example, as a sentence affective analysis, we may only care about the final output, not the output of each time step. Again, we may not need to have input at each time step. The main characteristic of RNN is that its hidden layer state can capture some information of a sequence. What Rnns can do.

Rnns has made great success in NLP. At this point, it is important to note that the most commonly used Rnns model is LSTMS, which is better able to capture long-term dependencies than the basic Rnns. But don't worry, LSTMS is essentially the same as Rnns, and then we'll explain that they just use a different way of calculating the hidden layer state. The following tutorial will detail Lstms. Here is a list of some of the application instances of Rnns in NLP (not all): language modeling and generating text

Given a sequence of words, we want to predict the probability of each word after the word given before it. The language model allows us to measure the likelihood of a sentence, which is an important input of machine translation (due to the high probability of the sentence generally correct). One of the benefits of predicting the next word is that we can get a build model that allows us to sample from the output probability and generate new text. Different training data, we can get a variety of models. The input of language modeling is usually a sequence of words (for example, using one-hot vector encoding), and the output is the probability of predicting the word. When we train the network, we make ot=xt+1, because we want the output of each step to be the actual next word.

Language modeling and generating text related papers are as follows: Recurrent neural network based language model Extensions of recurrent neural network based-language mod El generating Text with recurrent neural Networks machine translation

The similarities between machine translation and language modeling are the input of the source language (such as German) as well as the word sequence. The output we want is a sequence of words corresponding to the target language (such as English). A key difference is that when we see all the inputs, we begin to output, because the first word in the translated sentence depends on the information captured from the entire input sequence.


Recursive neural network for machine translation (source: http://cs224d.stanford.edu/lectures/CS224d-Lecture8.pdf)

The related papers of machine translation are as follows: A recursive recurrent neural Network for statistical Machine translation Sequence to Sequence Learning with N Eural Networks Joint Language and translation modeling with recurrent neural Networks speech recognition

By inputting a sound signal sequence of sound waves, we can predict a speech sequence and their probability.

Speech recognition related papers are as follows: towards End-to-end Speech recognition with recurrent neural Networks generate image description

Combined with convolution neural networks, Rnns has been used as part of an unmarked image description generation model. Surprisingly, the effect of its implementation. The composite model will even generate a literal description of the image and the features in it.


Deep Visual semantic alignment (Source: http://cs.stanford.edu/people/karpathy/deepimagesent/) Training Rnns for generating image descriptions

Training Rnns is similar to training the traditional neural network. We also use the Reverse propagation algorithm (BP), but some changes are needed. Because the parameters in the network are shared all the time steps, the gradient of each output is not only related to the calculation of the current time step, but also depends on the previous time step. For example, in order to compute the gradient of a t=4 we have to reverse propagate to the preceding 3 time steps, then sum these gradients. This becomes by the reverse propagation of the time (BPTT). If these introductions are still difficult to understand, do not worry, there will be an entire article in detail BPTT. Now, you need to know that ordinary Rnns through BPTT training is difficult to learn to rely on for a long time (for example, in a long distance between the steps of dependence), which is due to gradient disappearance or explosion problems. There are ways to solve these problems, and some specific rnns (such as LSTMS) are specifically designed to address such problems. RNN extension

Over the years, researchers have developed more complex rnn to deal with some of the drawbacks of the common RNN model. We will describe these in more detail in later articles, and this section is only a brief overview so that you are familiar with the classification of these models.

Bidirectional Rnns is based on the creation that the output in time t depends not only on the previous elements in the sequence, but also on the future elements. For example, to predict the missing words in a sequence, you need to understand the context. Bidirectional Rnns is very simple. They are just two rnn piled up each other. The output is then computed based on the hidden layer state of the two rnn.

The deep bidirectional Rnns is similar to the bidirectional Rnns, but each time step is a multi-layer network. So in fact, it will have a higher learning ability (but you also need a larger training sample)

The LSTM network has been very popular recently, and we introduced it briefly in the front. Lstm are the same as the RNN basic schemas, but they use different functions to compute the implied layer state. The memory in the lstm is called a unit, and you can see them as black boxes, and their inputs are the previous state ht−1 and the current input XT. The internal units determine which parts of the memory are preserved and forgotten. They then combine the previous state, the current memory, and the input. These types of units have proven to be very effective in capturing long-term dependencies. Lstms is very difficult to understand at first contact, but if you are interested, you can learn more from this article. Conclusion

So far everything is fine. I hope you have a basic understanding of what Rnns is and what you can do about it. In the next article we will use Python and Theano to implement the first version of the RNN language model. Please ask questions in the comments!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.