Preface
For a long I ' ve been looking for a good tutorial on implementing LSTM networks. They seemed to be complicated and I ' ve never do anything with them before. Quick Googling didn ' t help, as all I ' ve found were some slides.
Fortunately, I took part of Kaggle EEG competition and thought that it might is fun to use LSTMS and finally learn the Y work. I based my solution and this post's code on CHAR-RNN by Andrej Karpathy, which I highly recommend your to check out. RNN Misconception
There is one important thing this as I feel hasn ' t been emphasized ' strongly (and is the main enough reason I why ' t Get myself to does anything with Rnns). There isn ' t much difference between a RNN and Feedforward network. It ' s The easiest to implement a RNN just as a feedforward network with some parts of the input feeding into the middle of The stack, and a bunch of outputs coming out from there as. There is no magic internal state kept in the network. It ' s provided as a part of the input! The overall structure of the Rnns is very similar to this of Feedforward networks. lstm Refresher
This section would cover only the formal definition of LSTMS. There are lots of other nice blogs posts describing in detail how can you imagine and do these equations.
Lstms have many variations, but we ' ll stick to a simple one. One cell consists of three gates (input, forget, output), and a cell unit. Gates use a sigmoid activation, while input and cell the state is often transformed with Tanh. LSTM cell can is defined with a following set of equations:
Gates:it=g (WXIXT+WHIHT−1+BI) it=g (WXIXT+WHIHT−1+BI) ft=g (WXFXT+WHFHT−1+BF) ft=g (WXFXT+WHFHT−1+BF) ot=g (W