http://blog.csdn.net/linmingan/article/details/50958304
The inverse propagation algorithm of cyclic neural networks is only a simple variant of the BP algorithm.
First we look at the forward propagation algorithm of cyclic neural networks:
It should be noted that there is only one weight matrix at the moment of the rnn to the current moment, and that the weight matrix has nothing to do with time. The difference between the forward propagation algorithm and the BP neural network is that the information of the hidden layer is more than one time ago. The forward propagation algorithm on our side is likely to be somewhat different from what you would normally see, because this forward propagation algorithm splits each stage of the propagation process into separate representations. Before entering the activation function, it is represented by an additional two variables, before entering the hidden layer activation function e and entering the activation function g of the output layer. So the split is to better realize the reverse propagation algorithm, namely chain derivation rule.
RNN forward propagation algorithm is carried out in time, in fact, some timing data, such as audio in the frame after a series of audio signal frames. RNN input data and other neural networks such as DNN input data, RNN input data samples can not be disrupted, must be input time, and other neural network input data can be disrupted. The implementation of forward propagation algorithm and reverse propagation algorithm of RNN and other neural networks is nothing special, just a few more variables. So RNN's T-cycle of time can be seen as the T-Training sample (or T-batch training sample), but it must be kept in order, otherwise it may not be useful to learn.
Next, give the BPTT algorithm:
From the BPTT algorithm, it can be seen that the requirements and gradients must first be evaluated, because the forward propagation algorithm we can know and is directly determined by the value. Similar requirements and gradients must first be asked for gradients (errors in the above algorithm, starting from line 6th should be changed, please note). This is the chain law in the derivation of neural networks (chain rule).
BPTT algorithm Detailed:
which
which
which
which
which
which
which
which
which
The 10th step is to update the right of the 5th step.