Materials to understand lstm
People never judge a academic paper by those user experience standards this they apply to software. If The purpose of a paper were really promoting understanding, then most of them suck. A while ago, I read this article talking about academic pretentiousness and it speaks me heart out. My feeling are, papers are not for better understanding but rather for self-promotion. It ' s a way for scholars to declare achievements and make others admire. Therefore the golden standard for a academic paper has always been letting others the acknowledge, greatness but UN Derstand enough to surpass itself.
When it comes to lstm, for example, there aren ' t many good materials. Most likely what they would show are this bullshit image:
There are many things on this image pisses me off.
The "All", "they", "F", "footnote", "looks", "" "" F "," "" "," "" "Denote non-linearity and" "T" IME, they have f_t in their equations. And they are not the same thing!
Second, they have these dash lines and solid lines to represent time delay. So solid lines are c_t, and dash lines are. There are 5 arrows coming out of the "Cell" and but only 4 are labelled. One c_t is incorrectly labelled with dash line. One solid line are supposed to being a dash line and c_t-1.
Third, what the hell are black dots? You are have to look at the equation to figure out of this they are element-wise multiplications of two.
And finally, c_t is supposed to be calculated as a summation of f_t * c_t-1 and i_t * TANH (W_XC * x_t + w_hc*h_t-1 +b_c) ( The third equation). But the image completely misses the plus sign.
Compared to this shitty image, the following version is slightly better:
The 3 c_t-1 are correctly represented as dash lines. The plus sign is there too. Sigmoid functions are written with the proper sigma sign, not some freaking.
Things don ' t have to being so difficult. and understanding shouldn ' t be a painful experience. But Unfortunately, academic writing makes it so almost "all" time.
Perhaps the best learning material is really this excellent blog post:understanding lstm Networks
Posted on August, 2015 humans don ' t-start their thinking from scratch every. As you read this essay, You...colah.github.io
And its diagram is simply beautiful:
I am the most evil thing about the the "the" two versions are that they treat-C (the cell, or the memory) not as a input t o the lstm block, but rather a thing this is internal to the block. And they adopt this complex the dash line, the solid line thing to represent delay. This is really confusing.
But I do noticed one difference between the "one two diagrams and the last one." The two diagrams sum up the inputs with the outputs from the previous layer to calculate the gates (for example, f_t and i_t). But in the third version, the author concatenate them. I don ' t know if it is because the third version are a variation or this doesn ' t matter in terms of correctness. (But I don ' t do it should is concatenation because the vector size H shouldn ' t be changed.) If It is really concatenation, the vector size of H would change to the size of x+h?)
I drew a new diagram, which I is better:
from:https://medium.com/@shiyan/materials-to-understand-lstm-34387d6454c1#.encwjqmd6