The third of deep learning: RNN

Source: Internet
Author: User

RNN, also known as recurrent neural Network, is a nonlinear dynamic system that maps sequences to sequences with five main parameters:[WHV, WHH, Woh, BH, Bo, H0], typical structural diagrams are as follows:



Explain:

    • like normal neural networks, RNN has input layer output layer and hidden layer, not the same is rnn at different times T will have different states, Where the output of the hidden layer at the moment of T-1 is acting on the hidden layer of the T-moment
    • [WHV, WHH, Woh, BH, Bo, H0] The parameter meaning is: WHV: input layer to the hidden layer of the weight parameter, whh: The hidden layer to the hidden layer of the weight parameter, Woh: The implied layer to the output layer of the weight parameter, BH: The offset of the hidden layer, The offset of the Bo output layer, H0: the output of the implied layer of the starting state, typically initially 0   
    • Li style= "margin:0px 0px 0.25em 30px; padding:0px "> states of different time share the same weight W and offset B

how to calculate RNN :




Before looking at the general neural network and CNN, and then look at RNN actually feel that the structure is not complex, the computational process seems to be, the RNN calculation and the ordinary feedforward algorithm is not much difference, but the last moment of the hidden layer of the output of the current moment in the calculation process is used, That is, the process of continuous passing, which is why the RNN is from sequence to sequence, a state of access and the output of several previous states are related.

Given a loss of function L:




Of course different neural networks correspond to the same training methods, RNN because of the time series, so the training process is not the same as the previous network, RNN training using BPTT (back prropagation Through time), The method was developed by Werbo and others in 1990.

The specific training process is as follows:




The above algorithm is the process of solving the gradient, using the classic BP algorithm, and nothing new, but it is worth mentioning that in the t-1 time to HT? 1 of the derivative value, also need to add the T time derivative in the ht? 1 of the derivative value, so BPTT is also a chain-type derivation process .

But since the 10th line in the algorithm above, at the time of training T, there are t-1 parameters, so the derivation of the individual becomes the sum of the derivation of the entire previous state, for example, we at t moment to whh derivative, the formula is as follows:


  It is because of the long dependency that BPTT cannot solve the long-term dependency problem (that is, the current output is more than 10 steps away from the previous long sequence), because BPTT can bring about the so-called gradient vanishing or gradient explosion problem (the vanishing/ Exploding gradient problem). This article is a good explanation of why the gradient disappears and why the gradient explosion problem, in fact, the main problem is because in the BPTT algorithm, in the case of W, the derivative process of the chain is too long, and too long derivative chain in tanh as the activation function (its derivative value in the bptt between 0~1, The multiplication will make the final derivative of 0, which is the gradient vanishing problem, that is, T time has not learned the parameters of the t-n moment. Of course, there are a number of ways to solve this problem, such as LSTMS is specifically to deal with this problem, there are some methods, such as the design of a better initial parameters and replace the activation function (such as switch to Relu activation function).

  The above is the classic RNN model and the derivation process, in recent years rely on RNN has a lot of variations and expansion, see: Rnns extension and improved model

Reference documents:

"1" sutskever,training Recurrent neural Networks.phd thesis,univ.toronto (2012)

Introduction to "2" cyclic neural network (RNN, recurrent neural Networks)

 

The third of deep learning: RNN

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.