July Algorithm Deep Learning note 7--rnn

Source: Internet
Author: User

This set of notes is followed by the July algorithm May in-depth study of learning and recorded, mainly remember me to learn machine learning when some of the concepts are more vague, specific courses refer to the July Judge network:
http://www.julyedu.com/

RNN: Using neural networks to process the state and model of sequence problems

Before, the model we were dealing with was called IID data; the network uses sample A to do a forward, whether it is a classification or a regression, then the second time with B Forward,a and B does not matter.
This kind of network learns to be a function, enter x, get Y.
IID: Separate and distributed. The sample is independent from the sample.

More data does not meet the IID
such as serial data: Voice, video, image, text, etc.
Sequence data consists of two types: Time series: speech, Spatial sequence: image
Sequence generation, such as language translation, automatic text generation
Content extraction, such as image description
Sequence Samples

Sequence problems can be easily divided into five types

First: function problems (not sequences)
The second type: one to many
The third kind: more than one
The fourth type: more than the interval
The fifth type: many to many
The RNN not only handles the input of the sequence, but also the output of the sequence, which refers to the sequence of vectors.
RNN learned is the program (state machine), not the function of the typical application


Https://github.com/karpathy/neuraltalk2
One to many input is a picture, and output a sequence of text (input and output at least by one is a sequence)


http://vlg.cs.dartmouth.edu/c3d/
More than one: Enter a text to classify the text (the text is longer);
Detection of event in the video screen: (Find the video shot in a set of video screens)


More than the interval: language translation


Http://research.microsoft.com/apps/pubs/default.aspx?
id=264836
Many to many: the description of the video screen, automatically give the text of the commentary. Sequence Prediction

The input is a sequence, the output is also a sequence, and outputs is the next sequence. Used to make the build model. (Music generator)

F is often difficult to model, in order to simulate F, the G model is also dependent on previous inputs and previous states (introducing state variables)
How to explain the problem. Sequence Prediction Model

The RNN is not structured to describe the sample, and is fitted with the neuron's scale and multiplier. Benefit: End to end issues.

Left: The forward: x input, the operation (sigmoid) to get H (x), and then the operation to get the output, the previous state (H (T-1)) to participate in the operation.
The final output is based on the results of the new union, making a full connection to get Y;
It can also be represented as the right figure:

H0 and x0 can be arbitrarily defined, and the predicted value can be used as the next input. RNN Training


Add the loss values defined for each step. (Not weighted)
As you can see, the derivation of the formula in the red box in the image below

Using chain rules to form the expansion of the multiplication, the multiplication may have some problems, if w is less than 1, then the multiplication may be approaching 0, if W is very large, the result of the multiplication may be more and more large.
This is where the gradient disappears and explodes.

The reason for the disappearance and dispersion here is different from the previous one, where the disappearance and dispersion are caused by the expansion of the sequence rather than by the product of the network space. If the network is composed of multiple hidden layers, it will aggravate the disappearance and dispersion. BPTT algorithm: Solution


But the actual use of the very few, because the W of the connection is unavoidable.
An improved form of RNN with common use:
Set some thresholds via input input:

The most widely used, successful RNN

A block is made up of two outputs, adding a new variable C_cell, and the cell's state value has neurons that are long-term, some short-term. H is used immediately to the output, C has been passed in step. It could be a two-step previous value, or it could be a 100-step previous value, that is, the dimension in C is a long-ago state, or a very near state. Each a block is called a layer

Lstm:forget/input Unit

H (t-1) and X as two input "" means multiply and add. F control how much to forget,

Lstm:update cell

FT: Controls the percentage of retention states. Shows the use of long memory or short memory.
It: Ensure the CT can be adjusted. Updated by how many.

Lstm:output

HT updates based on OT

Overall process

The joined node blocks the path of derivation, causing some of the derivative paths to break. lstm Other variants

using Lstm

High complexity, difficult to train

Resources:
July algorithm: http://www.julyedu.com/
Photo from the course PPT

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.