Rnns Study Summary

Source: Internet
Author: User

Recurrent neural Networks ( Recurrent neural Networks,rnns)

There are two main applications of Rnns: (1) scoring arbitrary sentences according to the similarity occurring in the real world, thus providing a method for measuring grammatical and semantic correctness, which can be used in machine translation systems. (2) Predict the next text that will appear, the value of this application is relatively low.

I. Introduction of Rnns

In traditional neural networks, it is often assumed that the input (and output) is independent of each other, but this is impractical in many applications, such as the need to predict the next text to appear, then the role of context can not be ignored. The idea of Rnns is to take advantage of the continuous information between the contexts in which the meaning of recurrent in Rnns means that it performs the same task for each element in the sequence, a typical Rnns and its unfolding form as shown.


Where the right side is the expanded form on the left, assuming that X is the input sentence, the XT is the input of the T-moment, assuming that the sentence is One-hot vector encoded form.

Note:one-hot vector is NLP (Natural language coding) in the expression of the simplest form of a word, each word is expressed as a vector, only it corresponds to a position of 1, the other position is 0, the disadvantage of this method is obvious, The length of the vector is the same as all the words to be represented, and if the new word comes with a vector adjustment, and the whole matrix is very large and, more importantly, it has no way to establish the relationship between the words. Of course, there are other means of expression, such as Word2vec, of course, the use of good presentation method can also improve the efficiency of Rnns. "

ST is the state of the hidden layer T-moment, and the formula is calculated as

Where f is a nonlinear function, such as Tanh, Relu, and so on.

OT is the output of the first time, if you want to predict the next word with Rnns, you can use the Softmax classifier, then the function is


Second, the operation flow of Rnns

(1) Using a coded form to encode the training sample set, for example, One-hot vector, get the matrix of input and output. If the purpose of this algorithm is to predict the next word, then for each sample word, its output is the next word in the sentence that is immediately adjacent to it. (Then there is a problem, that is, the first word and the last word of the sentence is not the input and output corresponding, at this time can be used in a special set of two values to indicate the beginning of sentences and the end of the sentence, for example, you can think the first word of the sentence input is 0, the last word of the sentence output is 1 )

"Note: Processing a collection of text samples is a very complex process, including segmenting samples, matching words with numbers, and removing rare words, and so on. 】

(2) Initialize U, W, V, the Rnns for forward propagation training, the formula is as follows.


(3) The loss function is loss according to the predicted value and the real merit. A commonly used loss function is Cross-entropyloss, assuming that the true output is Y, the prediction is O, and there is a total of n training samples, the loss function is

(4) Based on the loss function, using SGD and backpropagation Through time (BPTT) to train the network, note that this is not a simple use of the BP algorithm, because the gradient of each step output is not only related to the current calculation, but also related to the previous steps. bpTT played a very important role in Rnns, the introduction can click here)

Three, the characteristics of Rnns

From the above analysis can be seen:

(1) Rnns in each step of the forward propagation is the exact same calculation step, but the input changes only, so this greatly reduces the need to learn the parameters.

(2) Rnns, when calculating the output of the current step, is also related to the previous calculation, so Rnns relies on the characteristics of the past crawl, so it can learn the correlation between the sequence of words before and after.

(3) Rnns can only relate to the previous steps, but to the words before a few steps can not learn the connection between them, so Rnns does not synthesize the general situation to produce particularly meaningful text, which limits its development.

(4) BPTT has the problem of vanishing gradient (vanishing gradient problem), the proper initialization of W matrix can alleviate the effect of vanishing gradient, and a better method is to use Relu instead of Tanh or sigmoid function.

Note: The vanishing gradient issue is described in detail in the 2013 article on the difficulty of training recurrent neural networks, which also speaks of an expansion gradient in the algorithm (exploding gradient Problem) problem, but can use predefined threshold clipping gradient method to solve the expansion gradient problem, so no longer focus on this. Click here for the download URL of the article "

In the improved algorithm of Rnns, LSTM and GRU algorithm can solve the problem of vanishing gradient, and can effectively learn the correlation of longer range, which is more concerned in the development of these two algorithms.

The improved algorithm of Rnns

1, bidirectional Rnns (bidirectional Rnns Network)

Article: "Bidirectional recurrent neural Networks"

The idea of Brnns is that the output of time t is not only related to the previous element in the sequence, but also to the subsequent element. For example, when predicting a missing value in a sentence, consider the preceding and subsequent text synthetically. The result of Brnns is very simple, as shown, it's like two rnns stacked together. Its output is determined by the implicit layers of the two rnns. The structure of BRNN makes it possible to train the network in two time directions simultaneously, the training time of Brnns is comparable to that of other Rnns, Brnns does not need to make explicit assumptions about the data distribution, so it can effectively predict the condition posteriori probability of the whole conforming sequence.

2, Deep (bidirectional) Rnns

Article: "SPEECH recognition with deep recurrent neural NETWORKS"

Deep Rnns is similar in structure to Brnns, but contains multiple layers in each step, which in practice provides a higher learning capability, but also requires a lot of training data. It is structured as shown.

3, LSTM Networks (Long short term Memory Networks)

article: "LONG short-term MEMORY"

Introduce a more detailed URL: Click to open the link

Lstms is structurally different from Rnns, but it uses different functions to calculate the state of the hidden layer, which is better than Rnns because it has the ability to learn "long-term dependencies".

To put it simply, the traditional Rnns structure is this:


The structure of the LSTMS is this:


The markup in the diagram means:


As can be seen, LSTMS only the calculation function is improved, if for the application level, only need to consider the entire calculation process as a black box, only care about its input and output and need to set the parameters can be, if the study of the internal mechanism of the level, you can refer to the Web site provided above.

The calculation formula in the black box of the LSTMS is shown below.


Threshold settings in the LSTM:


4, GRU (Gated recurrent Unit recurrent neural Networks)

Article: "Learning Phrase representations using RNN encoder–decoder forstatistical machine translation"

Gru's design thought is very similar to the LSTM, but the calculation formula is different:


GRU contains two thresholds, a reset threshold R and an update threshold Z,R determines how the new incoming input is combined with the previous one, and Z defines how the previous calculation is maintained. If R is set to full 1 and Z is even 0, then the GRU becomes the most common Rnns.

Threshold settings in the GRU:


For the difference between LSTMS and GRU, refer to the article: "Empirical Evaluation of Gatedrecurrent neural Networks on Sequence Modeling"

Five, the Rnns application field

Rnns is widely used in NLP tasks, which is the most widely used LSTMS algorithm. The application of Rnns in NLP mainly includes the following aspects:

(1) languagemodeling and generating Text

Here are three articles related to:

"Recurrent neuralnetwork based language model"

"Extensions ofrecurrent neural network based language model"

"Generating textwith recurrent neural Networks"

(2) Machinetranslation

Machine translation is the statement that translates a language's statement into another language. Unlike language modeling, it does not begin until the sentence is fully entered.

Here are the related articles:

"A recursiverecurrent neural Network for statistical machine translation"

"Sequence tosequence Learning with neural Networks"

"Joint Languageand translation Modeling with recurrent neural Networks"

(3) SpeechRecognition

speech recognition is a sequence of sound signals that can be used to predict a series of speech fragments and their corresponding probability values when an acoustic signal is entered.

Here are the related articles:

"Towardsend-to-end Speech recognition with recurrent neural Networks"

(4) Generatingimage descriptions

Rnns has been used with CNN to add descriptions to untagged images, and to view sites: Click to open a link




"References" [1] http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
[2] Relevant papers mentioned in the article

Rnns Study Summary

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.