lstm Neural network in simple and lucid
Published in 2015-06-05 20:57| 10,188 Times Read | SOURCE http://blog.terminal.com| 2 Reviews | Author Zachary Chase Lipton lstm Recurrent neural network RNN long-term memory
Summary:The LSTM network has proven to be more effective than traditional rnns, according to the introduction of the deep learning three Daniel. This paper is composed by UCSD, a doctoral student in the study of machine learning theory and application, Zachary Chase Lipton, explains the basic knowledge of convolution network in plain language, and introduces the long short term memory (LSTM) model.
"Editor 's note" uses feedforward convolution neural networks (convnets) to solve computer vision problems and is the most widely known achievement in depth learning, but a small number of public attention has been devoted to using recurrent neural networks to model time relationships. The LSTM network has proved to be more effective than traditional Rnns, based on the elaboration of the three-way study. This paper is written by PhD Zachary Chase, Ph. D. In the study of machine learning theory and applications at the University of California, San Diego (UCSD), which explains the basics of convolution networks in plain language and introduces the long Short-term memory (LSTM) model.
Given the wide applicability of deep learning in realistic tasks, it has attracted the attention of many technical experts, investors and non-professional professionals. Although the most notable achievement of deep learning is the use of feedforward convolution neural networks (convnets) to solve computer vision problems, a small number of public attention has been devoted to using recurrent neural networks to model time relationships.
(Note: To help you start experiencing the lstm recursive network, I have attached a simple micro-example that comes pre-loaded with NumPy, Theano and a lstm sample git clone of Jonathan Raiman)
In a recent article, "Learning to read recurrent neural networks," I explained why, despite the incredibly successful feedforward networks, they were constrained by the inability to explicitly simulate time relationships, and all data points were assumed to consist of fixed-length vectors. In the concluding part of the article, I undertook to write an article explaining the basic knowledge of convolution networks and introducing a long short-term memory (LSTM) model.
First, introduce the basic knowledge of neural network. A neural network can be expressed as a graph of an artificial neuron, or a node and a forward edge, used to model synapses. Each neuron is a processing unit that takes the output of the node connected to it as input. Before the output is emitted, each neuron will first apply a non-linear activation function. It is because of this activation function that neural networks have the ability to model nonlinear relationships.
Now, consider this recent famous paper playing Atari with Deep reinforcement Learning, combining convnets and intensive learning to train computers to play video games. The system has a performance beyond humans in some games, such as breakout!, which can be inferred by viewing the screen at any time in the appropriate strategy. However, when the optimization strategy is to be planned over a long span of time, the system differs greatly from the performance of the human being, such as Space Invaders (invaders).
Therefore, we introduce a recursive neural network (RNN), an ability to give neural networks explicit modeling of time by adding a self-connected hidden layer that spans time points. In other words, the feedback of the hidden layer not only goes into the output, but also enters the next time step hidden layer. In this article, I'll use some sketches of the recursive network, excerpted from the literature on the subject I'm about to review.
Now, we can visualize the connection in a loop-free form by expanding the network through two time steps. Note that weights (from input to hide and hide to output) are the same at each time step. A recursive network is sometimes described as a depth network, and its depth is not only between input and output, but also occurs across time steps, and each time step can be considered a layer.
Once expanded, these networks can use reverse propagation for end-to-end training. This reverse propagation extension, which spans time, is referred to as reverse propagation along time (backpropagation through times).
However, there is a problem that is mentioned in the Yoshua Bengio frequently cited paper (Learning long-term dependencies with gradient descent), which is the vanishing gradient. In other words, the error signal in the back of the time step is often not able to go back enough to the past, like an earlier time step, to affect the network. This makes it hard to learn the effects of long distances, such as a pawn that will come back after 12 steps.
The remedy for this problem is the long and short term memory (LSTM) model, which was first presented by Sepp Hochreiter and Jurgen Schmidhuber in 1997. In this model, a conventional neuron, a unit that applies an S-type activation to its input linear combination, is replaced by a storage unit. Each storage unit is associated with an input gate, an output door and an internal state that is not interfered by a time step without interference.
In this model, for each storage unit, three sets of weights are trained from the input, including the complete hidden state in the previous time step. A feed to the input node, at the bottom of the diagram above. A feed to the input gate, displayed at the bottom of the rightmost cell. Another feed to the output door, at the top right of the display. Each blue node is associated with an activation function, typically the S-type function, and the PI node that represents the multiplication. The most central node in the cell is called the internal state, and the weight of 1 spans the time step to feed back to itself. The internal state of the self connection side, is called the constant error conveyor or CEC.
In terms of forward delivery, input gate learning determines when to activate incoming storage units, and the output gate learns when to activate the outgoing storage unit. Accordingly, with regard to post delivery, the output door is learning when to let the error flow into the storage unit, and the input gate learns when to let it out of the storage unit and to the rest of the network. These models have proven to be very successful in a wide variety of handwriting recognition and image subtitle tasks. Maybe get a little more love, they can win the Space Invaders.
Author Introduction: Zachary Chase Lipton is a PhD student in the Computer. He researches machine learning theory and applications, and are a contributing editor to Kdnuggets.
original link : demystifying lstm neural Networks (translation/Wang Wei Zebian/Zhou Jianding)