July Algorithm Deep Learning note 7--rnn

Last Update:2018-07-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This set of notes is followed by the July algorithm May in-depth study of learning and recorded, mainly remember me to learn machine learning when some of the concepts are more vague, specific courses refer to the July Judge network:
http://www.julyedu.com/

RNN: Using neural networks to process the state and model of sequence problems

Before, the model we were dealing with was called IID data; the network uses sample A to do a forward, whether it is a classification or a regression, then the second time with B Forward,a and B does not matter.
This kind of network learns to be a function, enter x, get Y.
IID: Separate and distributed. The sample is independent from the sample.

More data does not meet the IID
such as serial data: Voice, video, image, text, etc.
Sequence data consists of two types: Time series: speech, Spatial sequence: image
Sequence generation, such as language translation, automatic text generation
Content extraction, such as image description
Sequence Samples

Sequence problems can be easily divided into five types

First: function problems (not sequences)
The second type: one to many
The third kind: more than one
The fourth type: more than the interval
The fifth type: many to many
The RNN not only handles the input of the sequence, but also the output of the sequence, which refers to the sequence of vectors.
RNN learned is the program (state machine), not the function of the typical application

Https://github.com/karpathy/neuraltalk2
One to many input is a picture, and output a sequence of text (input and output at least by one is a sequence)

http://vlg.cs.dartmouth.edu/c3d/
More than one: Enter a text to classify the text (the text is longer);
Detection of event in the video screen: (Find the video shot in a set of video screens)

More than the interval: language translation

Http://research.microsoft.com/apps/pubs/default.aspx?
id=264836
Many to many: the description of the video screen, automatically give the text of the commentary. Sequence Prediction

The input is a sequence, the output is also a sequence, and outputs is the next sequence. Used to make the build model. (Music generator)

F is often difficult to model, in order to simulate F, the G model is also dependent on previous inputs and previous states (introducing state variables)
How to explain the problem. Sequence Prediction Model

The RNN is not structured to describe the sample, and is fitted with the neuron's scale and multiplier. Benefit: End to end issues.

Left: The forward: x input, the operation (sigmoid) to get H (x), and then the operation to get the output, the previous state (H (T-1)) to participate in the operation.
The final output is based on the results of the new union, making a full connection to get Y;
It can also be represented as the right figure:

H0 and x0 can be arbitrarily defined, and the predicted value can be used as the next input. RNN Training

Add the loss values defined for each step. (Not weighted)
As you can see, the derivation of the formula in the red box in the image below

Using chain rules to form the expansion of the multiplication, the multiplication may have some problems, if w is less than 1, then the multiplication may be approaching 0, if W is very large, the result of the multiplication may be more and more large.
This is where the gradient disappears and explodes.

The reason for the disappearance and dispersion here is different from the previous one, where the disappearance and dispersion are caused by the expansion of the sequence rather than by the product of the network space. If the network is composed of multiple hidden layers, it will aggravate the disappearance and dispersion. BPTT algorithm: Solution

But the actual use of the very few, because the W of the connection is unavoidable.
An improved form of RNN with common use:
Set some thresholds via input input:

The most widely used, successful RNN

A block is made up of two outputs, adding a new variable C_cell, and the cell's state value has neurons that are long-term, some short-term. H is used immediately to the output, C has been passed in step. It could be a two-step previous value, or it could be a 100-step previous value, that is, the dimension in C is a long-ago state, or a very near state. Each a block is called a layer

Lstm:forget/input Unit

H (t-1) and X as two input "" means multiply and add. F control how much to forget,

Lstm:update cell

FT: Controls the percentage of retention states. Shows the use of long memory or short memory.
It: Ensure the CT can be adjusted. Updated by how many.

Lstm:output

HT updates based on OT

Overall process

The joined node blocks the path of derivation, causing some of the derivative paths to break. lstm Other variants

using Lstm

High complexity, difficult to train

Resources:
July algorithm: http://www.julyedu.com/
Photo from the course PPT

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

July Algorithm Deep Learning note 7--rnn

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

July Algorithm Deep Learning note 7--rnn

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support