Time Recurrent neural network lstm (long-short term Memory)

Source: Internet
Author: User
Tags git clone
LSTM (long-short term Memory, LSTM) is a time recurrent neural network that was first published in 1997. Due to its unique design structure, LSTM is suitable for handling and predicting important events with very long intervals and delays in time series. Based on the introduction of deep learning three Daniel, Lstm network has been proved to be more effective than traditional rnns. This paper is written by UCSD, PhD Zachary Chase Lipton, who studies machine learning theory and application, explains the basic knowledge of convolutional networks in plain language, and introduces the long-term memory (LSTM) model.

"Editor 's note" The use of feedforward convolutional neural Networks (convnets) to solve computer vision problems is the most widely known result of deep learning, but a few public attention has been devoted to using recurrent neural networks to model time relationships. The LSTM network has been proved to be more effective than the traditional rnns based on the elaboration of the deep learning three Daniel. This article was written by PhD Zachary Chase Lipton, a doctoral student in machine learning theory and application at the University of California, San Diego (UCSD), explaining the fundamentals of convolutional networks in plain language and introducing long-term memory (LSTM) models.

Given the wide applicability of deep learning in practical tasks, it has attracted the attention of many technical experts, investors and non-professionals. Although the most notable result of deep learning is the use of feedforward convolutional neural Networks (convnets) to solve computer vision problems, a small number of public attention has been devoted to using recurrent neural networks to model temporal relationships.

(Note: To help you get started with the lstm recursive network, I have attached a simple micro-instance, preloaded with NumPy, Theano and a Jonathan Raiman lstm sample git clone)

In the recent article, "Learning to read recurrent neural networks," I explained why, despite the incredible success of Feedforward networks, they were constrained by the inability to explicitly simulate time relationships and the assumption that all data points were made up of fixed-length vectors. In the conclusion section of that article, I promised to write an article explaining the basics of convolutional networks and introducing the long-term memory (LSTM) model.


First, introduce the basic knowledge of neural network. A neural network can be expressed as a graph of an artificial neuron, or a node and a forward edge, used to model synaptic. Each neuron is a processing unit that will connect the output of its nodes as input. Before the output is emitted, each neuron will first apply a nonlinear activation function. Because of this activation function, neural networks have the ability to model nonlinear relationships.

Now, consider this recent famous paper playing Atari with deep reinforcement learning, combining convnets and reinforcement learning to train your computer to play video games. The system has more than human performance in some games, such as breakout!, the game at any time the appropriate strategy, can be inferred from the view screen. However, when optimization strategies are needed for long-span planning, the system is far from human performance, such as Space Invaders.

Thus, we introduce a recursive neural network (RNN), a capability that gives neural networks the ability to explicitly model time by adding a self-connected hidden layer that spans a point in time. In other words, the feedback of the hidden layer not only goes into the output, but also enters the next-step hidden layer. In this article, I'll use some schematic diagrams of the recursive network, excerpted from the literature of the topic I'm about to review.


Now, we can expand the network by two time steps to visualize the connection in a loop-free form. Note that weights (from input to hidden and hidden to output) are the same at each time step. A recursive network is sometimes described as a deep network, and its depth is not only between input and output, but also across time steps, each time step can be considered a layer.


Once expanded, these networks can be used in reverse propagation for end-to-end training. This cross-time step of the reverse propagation extension, known as the reverse propagation along the time (BackPropagation Through times).

However, there is a problem, as mentioned in Yoshua Bengio often quoted papers (learning long-term Dependencies with Gradient descent), which is the vanishing gradient. In other words, the wrong signal behind time steps often does not go back to the past enough, like earlier time steps, to affect the network. This makes it difficult to learn the effects of long distances, such as the missing pawn will come back to you after 12 steps.

The measure to remedy this problem was the long-term memory (LSTM) model, which was first proposed by Sepp Hochreiter and Jurgen Schmidhuber in 1997. In this model, the normal neuron, a unit that applies the S-type activation to its input linear combination, is replaced by the storage unit. Each storage unit is associated with an input gate, an output gate, and an internal state that spans the time step without interference into itself.


In this model, for each storage unit, three sets of weights are trained from the input, including the complete hidden state of the previous time step. A feed to the input node, at the bottom of the figure above. A feed to the input gate, displayed at the bottom right of the cell. Another feed to the output door, on the top right side of the display. Each blue node is associated with an activation function, typically an S-type function, and a PI node that represents multiplication. The most central node in the cell is called the internal state, and the 1 weight spans the time step, which feeds back itself. The self-connecting edge of the internal state, known as a constant error belt or CEC.

In terms of forward passing, the input gate learns to decide when to let the activation pass into the storage unit, while the output gate learns when to make the outgoing storage unit active. Correspondingly, with regard to post-delivery, the output gate is learning when to let the error flow into the storage unit, while the input gate learns when to let it flow out of the storage unit and into the rest of the network. These models have been proven to be very successful in a wide variety of handwriting recognition and image-adding subtitles tasks. Maybe get more care, they can win the Space Invaders.

Author Introduction: Zachary Chase Lipton is a PhD student in the computer science department at UCSD. He Researches machine learning theory and applications, and are a contributing editor to Kdnuggets.

original link : demystifying LSTM neural Networks (translation/Wang Wei Zebian/Zhou Jianding)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.