The third of deep learning: RNN

Last Update:2016-05-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

RNN, also known as recurrent neural Network, is a nonlinear dynamic system that maps sequences to sequences with five main parameters:[WHV, WHH, Woh, BH, Bo, H0], typical structural diagrams are as follows:

Explain:

like normal neural networks, RNN has input layer output layer and hidden layer, not the same is rnn at different times T will have different states, Where the output of the hidden layer at the moment of T-1 is acting on the hidden layer of the T-moment
[WHV, WHH, Woh, BH, Bo, H0] The parameter meaning is: WHV: input layer to the hidden layer of the weight parameter, whh: The hidden layer to the hidden layer of the weight parameter, Woh: The implied layer to the output layer of the weight parameter, BH: The offset of the hidden layer, The offset of the Bo output layer, H0: the output of the implied layer of the starting state, typically initially 0 &NBSP;&NBSP;

how to calculate RNN :

Before looking at the general neural network and CNN, and then look at RNN actually feel that the structure is not complex, the computational process seems to be, the RNN calculation and the ordinary feedforward algorithm is not much difference, but the last moment of the hidden layer of the output of the current moment in the calculation process is used, That is, the process of continuous passing, which is why the RNN is from sequence to sequence, a state of access and the output of several previous states are related.

Given a loss of function L:

Of course different neural networks correspond to the same training methods, RNN because of the time series, so the training process is not the same as the previous network, RNN training using BPTT (back prropagation Through time), The method was developed by Werbo and others in 1990.

The specific training process is as follows:

The above algorithm is the process of solving the gradient, using the classic BP algorithm, and nothing new, but it is worth mentioning that in the t-1 time to HT? 1 of the derivative value, also need to add the T time derivative in the ht? 1 of the derivative value, so BPTT is also a chain-type derivation process .

But since the 10th line in the algorithm above, at the time of training T, there are t-1 parameters, so the derivation of the individual becomes the sum of the derivation of the entire previous state, for example, we at t moment to whh derivative, the formula is as follows:

It is because of the long dependency that BPTT cannot solve the long-term dependency problem (that is, the current output is more than 10 steps away from the previous long sequence), because BPTT can bring about the so-called gradient vanishing or gradient explosion problem (the vanishing/ Exploding gradient problem). This article is a good explanation of why the gradient disappears and why the gradient explosion problem, in fact, the main problem is because in the BPTT algorithm, in the case of W, the derivative process of the chain is too long, and too long derivative chain in tanh as the activation function (its derivative value in the bptt between 0~1, The multiplication will make the final derivative of 0, which is the gradient vanishing problem, that is, T time has not learned the parameters of the t-n moment. Of course, there are a number of ways to solve this problem, such as LSTMS is specifically to deal with this problem, there are some methods, such as the design of a better initial parameters and replace the activation function (such as switch to Relu activation function).

The above is the classic RNN model and the derivation process, in recent years rely on RNN has a lot of variations and expansion, see: Rnns extension and improved model

Reference documents:

"1" sutskever,training Recurrent neural Networks.phd thesis,univ.toronto (2012)

Introduction to "2" cyclic neural network (RNN, recurrent neural Networks)

The third of deep learning: RNN

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The third of deep learning: RNN

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The third of deep learning: RNN

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support