CS224D Lecture 9 Notes

Source: Internet
Author: User

Welcome reprint, Reprint annotated Source:

http://blog.csdn.net/neighborhoodguo/article/details/47193885


The contents of the recent lessons are not very difficult, and I have improved my comprehension (narcissism), so these lessons have been completed very quickly. Unconsciously LEC9 also completed. This tells the other rnn, where R is recursive is not the previous recurrent. Class teacher use recursive NN to do NLP and CV task, I personally think to do CV is good, this NLP how feel a little unreliable. Anyway, this model solves a lot of practical problems, and the performance is good, now to record it.

Let's start by combing through this lesson. First of all, how to put a sentence vector, then how to do parsing, then the method of building the object function Max-margin and bpts (backpropagation Through Structure), and finally a few improved versions of recursive NN and this model can also be computer vision work.

1.Semantic Vector Space for sentence

Similar to the previous stage of word vector space this time we are projecting an entire sentence into the semantic vector spaces. Our model is based on two assumptions: the meaning of a sentence is based on 1. The meaning of the word contained in this sentence; 2. How the sentence is constructed. The 2nd is still in the debate, our discussion model can complete two tasks at the same time, the first can learn the sentence of the tree model, the second can learn the sentence in the semantic vector space in the expression.

What is parsing tree? :

The above figure is the parsing tree described in this lecture, and the previous recurrent neural networks is actually similar to the following parsing tree, which is seen as a special representation of the previous parsing tree.

Which of these two representations is correct is not conclusive now (still cognitively debatable)

How do you learn this parsing tree? Clever man invented a method called Beam search is bottom-up method, from the lowest beginning, calculate which two become good score the biggest, then take out the largest score two node then they merge (Good evil). The last until the top of the list is all in the form of a parsing tree.

2.objection function? Max-margin Framework

Objection function in slide I later see the object function in Recommand reading to find that the sign is reversed. I guess it's not when the teacher wrote it back.

The object function is given in this paper, where Delta (Yi, y_hat) is multiplied by a k by the number of node markers that are wrong:

There are two parts of the score:

The first half of the V is to be learned through our model, and the second half is the probability that the log probability of the pcfg is the thing that happens and turns into log space.

Lecture Max-margin Not too detailed, the second paper is very good, here excerpt:

Finally get the formula of Max-margin. Our aim is to make the C (W) minimum

So why is the optimal, I thought for a long while to come up here with the popular point of the word record: If W is not the best w then Max () in the left of the score selected is not y_i, plus l_i so the final must be RI very large, is not the smallest, if w is optimal? It is certain that Max () has chosen Yi,delta to be zero, and then the total must be minimal. Such a w must make score (y_i) larger than all the other score (y) and a margin of l_i (y).

3.BPTS

Bpts in the paper is relatively small, slide is quite detailed and Pset2 part of the code is good drops. There are three differences between bpts and the previous traditional BP:

The 1th is to ask W's gradient to sum all of the node's; 2nd I feel is used to update the vector in semantic vector space; 3rd adds an error message:total error messages = Error messages from parent + error message from own score

The improved method of bpts parameters update can be adjusted learning rate or use Subgradient (using Subgradient method in the paper, cs229 also have a SMO method comparable)

Improved version of 4.Recursive nn

The first half of the paragraph is the simplest simple RNN. Finally, a modified version of the SU-RNN (syntactically-untied RNN)

That is, weight different choices based on the type of children.

Finally there is a CV display, which means that RNN for NLP operations and CVS is almost a step-by-step decomposition.

Website:

nlp.stanford.edu

Http://repository.cmu.edu/robotics

www.socher.org

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

CS224D Lecture 9 Notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.