# Cycle Neural Network Tutorial-the first part RNN introduction _ Neural network

Source: Internet
Author: User
Circular neural Network Tutorial-the first part RNN introduction

Cyclic neural Network (RNN) is a very popular model, which shows great potential in many NLP tasks. Although it is popular, there are few articles detailing rnn and how to implement RNN. This tutorial is designed to address the above issues, and the tutorial is divided into 4 parts:
1. Introduction to RNN (this tutorial)
2. Realize RNN with TensorFlow
3. Understanding bpTT and gradient extinction/explosion problems
4. Realize gru/lstm
As part of the tutorial, we will implement a language model based on RNN. The application of language model has two aspects: first of all, it allows us to rate arbitrary sentences according to the possibility of a sentence appearing in the real world, which gives us a way to measure sentence syntax and semantic correctness (that is, to see if a sentence is normal, the more normal the sentence, the higher the score). This model is usually part of the machine translation system. Second, we can generate text using the language model (which I think is a more cool application). Training language models in Shakespeare's writings can produce Shakespeare-style articles. This interesting blog from Andrej Karpathy demonstrates what you can do with the RNN character-level language model.

If you're unfamiliar with basic neural networks, you might want to start a neural network from scratch, and this blog will introduce the concept of basic neural networks and their implementation. WHAT is RNN?

RNN's idea is to use sequence information. In traditional neural networks, we assume that input (and output)
Are independent of each other, but for many tasks, this approach is not very good. If you want to predict the next word in a sentence, you'd better know what the first word of the word is. RNN is called "recurrence" because it performs the same task for each element in the sentence, and its output depends on the previous calculation. Another way to think about RNN is to think it has a "memory" function that captures previously calculated information. Theoretically, RNN can take advantage of the information in any length sequence, but in practice they are limited to reviewing a few steps (described in detail later). The typical rnn is this way:

The figure above shows the RNN that expands to the full network. Unfolding simply means that we write the network for the complete sequence. For example, if we are concerned with sentences with 5 words, the network will expand into a 5-tier network with each word corresponding to a network. The formula in the RNN is as follows: The XT is the input of T time. For example, X1 may be the one−hot vector of a word in a sentence. St is the hidden State of T moment. It is the "Memory Unit" of the network. ST is computed based on the hidden layer state of the last moment and the current input: St=f (uxt+wst−1). function f is usually non-linear, such as Tanh or Relu. S−1 is typically initialized to 0, which is then used to compute the first hidden state. OT is the output of t time. For example, if we want to predict the next word in a sentence, it will be a probability vector, and each value in the vector represents the probability of the corresponding word in the vocabulary. Ot=softmax (Vst).

The following points need to be noted: we can think of St as the Memory unit of the network, St captures the information that occurs all the time before. The output OT is computed only according to the memory of the time t. As is briefly mentioned above, it is slightly more complicated in practice because St usually cannot capture the information of a long time ago. Unlike traditional deep neural networks that use different parameters at each layer, RNN uses the same parameters (U,v,w) in all the steps. This reflects the fact that we perform the same task with different inputs at each step. This greatly reduces the number of parameters we need to learn. The image above has an output at each step, but it is not necessary according to the task. For example, when predicting the emotion of a sentence, we only care about the final output, not the emotion of each word. Similarly, we do not need to have input at every step. The biggest feature of RNN is its hidden state, which captures some of the information in the sentence. WHAT CAN Rnns do?

RNN has achieved great success in many NLP tasks. What I want to say at this point is that most of the RNN types are lstm,lstm that can better capture long-term dependencies than the original RNN. But don't worry, lstm is essentially the same as RNN, but it's not the same in the hidden layer, we'll explain it in the tutorial later. Here are some of the RNN applications (not all of them) in NLP Build language models & Generate text

Given a sequence of words we want to predict the probability of each word given the previous word. Language models allow us to evaluate the likelihood of a sentence, which is important for machine translation (because the sentence with higher probability is the more correct). The side effect of predicting the next word is that we can get a build model, which allows us to generate new text by sampling from the output probability. Based on our training data, we can generate all kinds of things. In language modeling, our input is a sequence of words (for example, encoded as a one-hot vector), and our output is a predicted sequence of words. When the training network is, we set the ot=xt+1, because we want the output of T moment to be the next word.

Related papers on language models and generated texts: Recurrent neural network based language model Extensions of recurrent neural network based-language mode L Generating Text with recurrent neural Networks
MACHINE Translation Machine translation

The similarity between machine translation and the language model is that input is a sequence of words in the source language (e.g. German), and the output we want is a sequence of words in the target language (e.g. English). The key difference is that the output of machine translation will not begin to build until the entire input is completed, because the first word of the sentence may require information about the entire input sequence.

Thesis on machine translation: A recursive recurrent neural network for statistical Machine translation Sequence to Sequence Learning with Neu RAL Networks joint Language and translation modeling with recurrent neural Networks language recognition

Given the input sequence of acoustic signals from sound waves, we can predict the sequence of speech segments and their probabilities.

Speech recognition paper towards end-to-end Speech recognition with recurrent neural Networks generate image description

Along with convolution neural networks, RNN is used as part of the model to generate a description of an image that is not annotated. This work is quite astonishing. The composite model can even align the features found in the produced word images.
Training RNN

Training RNN is similar to training the traditional neural network, we also use the back propagation algorithm, but a little different. Because the parameters of each step in the network are the same, each output gradient depends not only on the current moment, but also on the time ahead. For example, in order to calculate the gradient at t=4 time, we need to reverse propagate the first 3 moments and add the gradient, which is called BPTT, don't worry, the following blog will have a detailed description of this. Now all we need to know is that the general bpTT-trained RNN will have a long-term dependency problem (e.g. dependencies between very distant moments). There are mechanisms to address these problems, and some rnn types (such as LSTM) are specifically designed to address this problem. RNN extension

Over the years, researchers have developed more complex RNN types to deal with some of the drawbacks of the vanilla RNN model. We'll describe these in more detail in a later article, which is a brief overview so you can familiarize yourself with the classification of the models.

Bidirectional RNN is based on the idea that the input in T-time may not only depend on the previous element in the sequence, but may also depend on the following elements. For example, to predict a missing word in a sequence, you might want to look at the context on the left and right. Bidirectional rnn are very simple, they are just two rnn stacked on top of each other, and then the output is computed based on the hidden state of two rnn.

Depth (bidirectional) RNN and bidirectional rnn are similar, but now there are multiple layers at every moment. In practice this gives us a higher ability to learn (but at the same time we need a lot of training data).

LSTM Network is now very popular, above we have briefly introduced lstm, it and RNN is not the essence of the difference, but they two use different functions to compute the state of the hidden layer. The memory in Lstm is called a cell, you can do it as a black box, they use the state of the last time ht−1 and the current input XT as input. The internal units decide what to remember (what to forget) from memory. Then they form the previous state, the current memory, and the input. These types of units have proven to be very effective in capturing long-term dependencies. Summarize

I hope you have a basic understanding of what RNN is and what they can do. In the next article, we'll use Python and Theano to implement the first version of our language model RNN.

Related Keywords:

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

## A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

• #### Sales Support

1 on 1 presale consultation

• #### After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

• Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.