Recurrent neural Networks, LSTM, GRU

Source: Internet
Author: User
Tags nets

Refer to:

The unreasonable effectiveness of recurrent neural NetworksRecurrent neural Networks

sequences . Depending on your background you might being wondering: What makes recurrent Networks so special ? A glaring limitation of Vanilla neural Networks (and also convolutional Networks) is this their API is too constrained:th EY accept a fixed-sized vector as input (e.g. an image) and produce a fixed-sized vector as output (e.g. probabilities of different classes). Not only That:these models perform this mapping using a fixed amount of computational steps (e.g. the number of layers in The model). The core reason that recurrent nets was more exciting was that they allow us to operate over sequences of vectors: Sequences in the input, the output, or by the most general case both. A Few examples may do this more concrete:

Each rectangle is a vector and arrows represent functions (e.g. matrix multiply). Input vectors is in red, output vectors is in blue and green vectors hold the RNN's state (more on this soon). From left to right: (1)Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized output (e.g. image classification). (2)Sequence output (e.g. Image CaptioningTakes an image and outputs a sentence of words). (3) Sequence Input(e.g. sentiment analysis where a given sentence is classified as expressing positive or negative sentiment). (4)Sequence input and Sequence output (e.g. machine Translation:an RNN reads a sentence in 中文版 and then outputs a Senten CE in French). (5) synced sequence input and output (e.g. video classification where we wish to labels each frame of the video). Notice The Every case is no pre-specified constraints on the lengths sequences because the recurrent transformation ( Green) is a fixed and can be applied as many times as we do.

As you might expect, the sequence regime of operation was much more powerful compared to fixed networks it is doomed From the get-go by a fixed number of computational steps, and hence also much more appealing for those of us who aspire to Build more intelligent systems. Moreover, as we ' ll see in a bit, Rnns combine the input vector with their state vector with a fixed (but learned) function To produce a new state vector. This can in programming terms is interpreted as running a fixed program with certain inputs and some internal variables. Viewed this, Rnns essentially describe programs. In fact, it's known that Rnns be turing-complete in the sense of they can to simulate arbitrary programs (with proper weights). But similar to universal approximation theorems for neural nets you shouldn ' t read too much into this. In fact, forget I said anything.

If training vanilla neural nets is optimization over functions, training recurrent nets are optimization over programs.

sequential processing in absence of sequences. You might is thinking that have sequences as inputs or outputs could be relatively rare, but a important point to Reali Ze is so even if your inputs/outputs is fixed vectors, it's still possible to use this powerful formalism to Proce SS them in a sequential manner. For instance, the figure below shows results from the other very nice papers from DeepMind. On the left, an algorithm learns a recurrent network policy, which steers its attention around an image; In particular, the it learns to the read out house numbers from the left to the right (Ba et al.). On the right, a recurrent network generates images of digits by learning to sequentially add color to a canvas (G Regor et al.):

Left:rnn learns to read house numbers. Right:rnn learns to paint house numbers.

The takeaway is so even if your datais not in form of sequences, you can still formulate and train powerful mo Dels that learn to process it sequentially. You ' re learning stateful programs, that process your fixed-sized data.

Vanilla Rnns Only has  hidden states and those hidden states serve as the memory for Rnns.
Class RNN: # ... def Step (self, x): # update the hidden state Self.h = Np.tanh (Np.dot (self). W_HH, self.h) + Np.dot (self. W_XH, x) # compute the output vector y = np.dot (self. W_hy, self.h) return y

E.g:an example RNN with 4-dimensional input and output layers, and a hidden layer of 3 units (neurons). This diagram shows the activations, the forward pass when the RNN is fed the characters "hell" as input. The output layer contains confidences the RNN assigns for the next character (vocabulary is "h,e,l,o"); We want the green numbers to being high and red numbers to being low.

Refer to:

difference between feedback RNN and Lstm/gru

Lstms is often referred to as fancy Rnns. Vanilla Rnns does not has a cell state. They only has hidden states and those hidden states serve as the memory for Rnns.

Meanwhile, LSTM has both cell states and a hidden states. The cell state had the ability to remove or add information to the cell, and regulated by "Gates". And because of this "cell", in theory, LSTM should is able to handle the long-term dependency (in practice, it ' s difficult to does so.)

Recurrent neural Networks, LSTM, GRU

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.