The unreasonable effectiveness of recurrent neural Networks

Last Update:2018-07-17 Source: Internet

Author: User

Tags nets

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

There ' s something magical about recurrent neural Networks (Rnns). I still remember I trained my recurrent network for image. Within a few dozen minutes of training my The baby model (with rather Arbitrarily-chosen hyperparameters) started to Gen Erate very nice looking descriptions of images this were on the edge of making sense. Sometimes the ratio of how simple your model are to the quality of the results for you are out of it blows past your expectatio NS, and this is one of the those times. What made this result, shocking at the time is that the common wisdom is that Rnns were to is supposed to TR Ain (with more experience I ' ve in fact reached the opposite conclusion). Fast forward about a year:i ' m training Rnns all of the time and I ' ve witnessed their power and robustness many times, and ye T their magical outputs still find ways of amusing me. This post is about sharing some of the magic with you.

We'll train Rnns to generate text character by character and ponder the question "how are that even possible?"

By the way, together and this post I am also releasing code on Github that allows you to train Character-level language m Odels based on Multi-layer Lstms. You are give it a large chunk of the text and it would learn to generate text like it one character in a time. can also use it to reproduce my experiments below. But we ' re getting ahead of ourselves; What are Rnns anyway? Recurrent neural Networks

sequences. Depending on your background your might be wondering:what makes and recurrent Networks so special? A glaring limitation of Vanilla neural Networks (and also convolutional Networks) is this their API is too constrained:th EY accept a fixed-sized vector as input (e.g. an image) and produce a fixed-sized vector as output (e.g. probabilities of different classes). Not only That:these models perform this mapping using a fixed amount of computational steps (e.g. the number of layers in The model). The core reason that recurrent nets are more exciting was that they allow us to operate over sequences of vectors:sequence s in the input, the output, or in the most general case both. A few examples may-I-concrete:each rectangle is a vector and arrows represent (functions. Matrix e.g y). Input vectors are in red, output vectors are in blue and green vectors hold the RNN's state (more on this soon). From left to right:(1)Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized output (e.g. image classification).(2)Sequence output (e.g. image captioning takes an image and outputs a sentence of words).(3)Sequence input (e.g. sentiment analysis where a given sentence is classified as expressing positive or negative sentiment) .(4)Sequence input and Sequence output (e.g. Machine Translation:an RNN reads a sentence in 中文版 and then outputs a Senten CE in French).(5) synced sequence input and output (e.g. video classification where we wish to label each frame of the "video"). Notice to every case are no pre-specified constraints on the lengths sequences because the recurrent transformation ( Green is fixed and can are applied as many times as we like.

As you might expect, the sequence regime of operation are much powerful compared to fixed networks that are From the get-go by a fixed number of computational steps, and hence also much more appealing for those of us who aspire to Build more intelligent systems. Moreover, as we'll be a bit, Rnns combine the input vector with their state vector with a fixed (but learned) function To produce a new state vector. This can in programming terms is interpreted as running a fixed program with certain inputs and some internal. Viewed this way, Rnns essentially describe programs. In fact, it's known That rnns are turing-complete in the sense that they can to simulate arbitrary programs (WI Th proper weights). But similar to universal approximation theorems a for neural nets you shouldn ' t read too a much of this. In fact, forget I said anything.

If training vanilla neural nets is optimization over functions, training recurrent the Nets is optimization over programs.

sequential processing in absence of sequences. You might is thinking that has sequences as inputs or outputs could relatively, rare a but point to important Ze is so even if your inputs/outputs are fixed vectors, it is still possible to use this powerful formalism to process T Hem in a sequential manner. For instance, the figure below shows results to two very nice papers from DeepMind. On the "left", an algorithm learns a recurrent network policy that steers it attention around an image; In particular, the it learns to read out house numbers from left to right (Ba et al.). On the right, a recurrent network generates images of digits through learning to sequentially add color to a canvas (Gregor et Al.): Left:rnn learns to read house numbers. Right:rnn learns to paint house numbers.

The takeaway is this even if your data is isn't in form of sequences, you can still formulate and train powerful models Learn to process it sequentially. ' Re learning stateful programs that process your fixed-sized data.

RNN computation. So what do you things work? At the core, Rnns have a deceptively simple api:they accept an input vector x and give to an output vector y. However, crucially this output vector ' s contents are influenced is not only by the input you just Federal Reserve in, but also on the ENT IRE History of inputs you ' ve fed in the past. Written as a class, the RNN ' s API consists of a single step function:

RNN = RNN ()
y = rnn.step (x) # x is a input vector, y is the rnn ' s output vector

The RNN class has some internal state, it gets to update every time. In the simplest case this state consists of a single hidden vector h. This is a implementation of the "step function" in a Vanilla RNN:

Class RNN:
  #
  ... def step (self, x):
    # Update the hidden state
    self.h = Np.tanh (self. W_HH, self.h) + Np.dot (self. W_XH, X))
    # Compute the output vector
    y = Np.dot (self. W_hy, self.h) return
    y

The above specifies the forward pass of a vanilla RNN. This RNN ' s parameters are the three matrices w_hh, W_xh, W_hy. The hidden state self.h is initialized with the zero vector. The np.tanh function implements a non-linearity that squashes the activations to the range [-1, 1]. Notice briefly how this works:there are two terms inside of the Tanh:one are based on the previous hidden state and one I s based on the current input. In Numpy np.dot is matrix multiplication. The two intermediates interact with addition, and then get squashed to the new state vector. If you are more comfortable with math notation, we can also write the hidden state update as Ht=tanh (WHHHT−1+WXHXT)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More