Deep learning "5" Cyclic neural network (RNN) Reverse propagation algorithm (BPTT) Understanding _DL

Source: Internet
Author: User

http://blog.csdn.net/linmingan/article/details/50958304

The inverse propagation algorithm of cyclic neural networks is only a simple variant of the BP algorithm.

First we look at the forward propagation algorithm of cyclic neural networks:


It should be noted that there is only one weight matrix at the moment of the rnn to the current moment, and that the weight matrix has nothing to do with time. The difference between the forward propagation algorithm and the BP neural network is that the information of the hidden layer is more than one time ago. The forward propagation algorithm on our side is likely to be somewhat different from what you would normally see, because this forward propagation algorithm splits each stage of the propagation process into separate representations. Before entering the activation function, it is represented by an additional two variables, before entering the hidden layer activation function e and entering the activation function g of the output layer. So the split is to better realize the reverse propagation algorithm, namely chain derivation rule.

RNN forward propagation algorithm is carried out in time, in fact, some timing data, such as audio in the frame after a series of audio signal frames. RNN input data and other neural networks such as DNN input data, RNN input data samples can not be disrupted, must be input time, and other neural network input data can be disrupted. The implementation of forward propagation algorithm and reverse propagation algorithm of RNN and other neural networks is nothing special, just a few more variables. So RNN's T-cycle of time can be seen as the T-Training sample (or T-batch training sample), but it must be kept in order, otherwise it may not be useful to learn.

Next, give the BPTT algorithm:


From the BPTT algorithm, it can be seen that the requirements and gradients must first be evaluated, because the forward propagation algorithm we can know and is directly determined by the value. Similar requirements and gradients must first be asked for gradients (errors in the above algorithm, starting from line 6th should be changed, please note). This is the chain law in the derivation of neural networks (chain rule).

BPTT algorithm Detailed:

which


which

which

which

which

which

which

which

which

The 10th step is to update the right of the 5th step.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.