CS224D Lecture 13 Notes

Source: Internet
Author: User

Welcome reprint, Reprint annotated Source:

http://blog.csdn.net/neighborhoodguo/article/details/47387229


Imperceptible to the third part, the whole course is almost over, although not the first time the whole public class, but still a little excited about it!

Don't say much nonsense, start summing up!

This class introduces a model, which is a very popular CNN (convolutional neural Networks) in computer vision. However, it is introduced in the application of NLP, the model can be seen in fact there are different applications Oh, maybe our brains are so working, haha pulled away.

Introducing CNN from RNN

RNN is the first to complete a parsing tree and then step-by-step climb (snail?) --), CNN does not have to complete a parsing tree, in the words of the paper is induced Feature Graph, you can generate their own topological structure. It looks pretty good. RNN is a used vector in the same layer will not use the second time, CNN is used by the vector in the same layer can also be used, never mind.

CNN is a lot like convolution, pick a window, then move from left to right, then create layers on the previous layer.

The first part introduces a most simple CNN, which is the structure of a pyramid:

Then use the same calculation steps as in Rnn, step-by-step recursive

This model is simple, but not effective

CNN and pooling

So just think of a good way, the first step and the previous pyramid structure, is to generate the pyramid of the bottom two layers:

Then a vector of C was generated, followed by the process of pooling, c_hat = Max{c}

But there's only one way to extract it. Build several pyramids soon good, not to build a few pyramid of the base, professional point is to use a different width of the window to extract feature map, and then pooling, so extract features more.

Tricks

One trick is said to improve accuracy.

In the final step of the training phase, before Z element-wise by a r,r is Bernoulli random variables.

This can intentionally increase the likelihood of overfitting, and then can train as many as possible features

When using, of course, do not have to multiply R, but weight will be very large, so:

Make it smaller, and then calculate the final result

There is also a trick to avoid overfitting:

There is also a trick--

Because our model is very complex, the dataset is very large, the training time is very long, but the final training results may not be optimal, so at every time, recorded the corresponding weight and accuracy, at the end of the completion of training to use the accuracy highest weight.

Complex Pooling schemes

At the end of the course, we talked about one of the most complicated and most paper CNN, and after reading it, I figured it out.

First floor

The first floor uses the wide convolution

On the left is narrow to the right is wide.

How to convolution exactly? First choose a m as weight matrix

Then generate M

Finally, calculate the first layer:

Second floor

The K-max pooling used in this model is not the same as before, where the first K-Max, unlike the previous model, extracts only the largest

The first is the calculation method of K:

Where L is the total number of layers, L is the corresponding layer, S is the number of words in the sentence. K_top this on its own optimization.

The calculation method is similar to this, the difference is to take out the first k max

Folding

The simplest is to add every two lines together, plus the first is the D line, plus the D/2 line

Last Layer

Get the desired features and finally use fully connected layer to make predictions!

CNN Application Machine Translation

Let's say what kind of CNN you used to generate the topmost feature, and then use RNN to generate the corresponding target language

The pros and cons of various model

Bag of vectors is good for simple classification

Window model is good for the classification of words, long segment of the classification is not

CNNs was blown to the balm.

Recursive nn semantics plausible, but to generate a parsing tree (how I feel this is not reliable)

Recurrent nn is plausible in cognitive science, but now performance is not optimal

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

CS224D Lecture 13 Notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.