Welcome reprint, Reprint annotated Source:
http://blog.csdn.net/neighborhoodguo/article/details/47193885
The contents of the recent lessons are not very difficult, and I have improved my comprehension (narcissism), so these lessons have been completed very quickly. Unconsciously LEC9 also completed. This tells the other rnn, where R is recursive is not the previous recurrent. Class teacher use recursive NN to do NLP and CV task, I personally think to do CV is good, this NLP how feel a little unreliable. Anyway, this model solves a lot of practical problems, and the performance is good, now to record it.
Let's start by combing through this lesson. First of all, how to put a sentence vector, then how to do parsing, then the method of building the object function Max-margin and bpts (backpropagation Through Structure), and finally a few improved versions of recursive NN and this model can also be computer vision work.
1.Semantic Vector Space for sentence
Similar to the previous stage of word vector space this time we are projecting an entire sentence into the semantic vector spaces. Our model is based on two assumptions: the meaning of a sentence is based on 1. The meaning of the word contained in this sentence; 2. How the sentence is constructed. The 2nd is still in the debate, our discussion model can complete two tasks at the same time, the first can learn the sentence of the tree model, the second can learn the sentence in the semantic vector space in the expression.
What is parsing tree? :
The above figure is the parsing tree described in this lecture, and the previous recurrent neural networks is actually similar to the following parsing tree, which is seen as a special representation of the previous parsing tree.
Which of these two representations is correct is not conclusive now (still cognitively debatable)
How do you learn this parsing tree? Clever man invented a method called Beam search is bottom-up method, from the lowest beginning, calculate which two become good score the biggest, then take out the largest score two node then they merge (Good evil). The last until the top of the list is all in the form of a parsing tree.
2.objection function? Max-margin Framework
Objection function in slide I later see the object function in Recommand reading to find that the sign is reversed. I guess it's not when the teacher wrote it back.
The object function is given in this paper, where Delta (Yi, y_hat) is multiplied by a k by the number of node markers that are wrong:
There are two parts of the score:
The first half of the V is to be learned through our model, and the second half is the probability that the log probability of the pcfg is the thing that happens and turns into log space.
Lecture Max-margin Not too detailed, the second paper is very good, here excerpt:
Finally get the formula of Max-margin. Our aim is to make the C (W) minimum
So why is the optimal, I thought for a long while to come up here with the popular point of the word record: If W is not the best w then Max () in the left of the score selected is not y_i, plus l_i so the final must be RI very large, is not the smallest, if w is optimal? It is certain that Max () has chosen Yi,delta to be zero, and then the total must be minimal. Such a w must make score (y_i) larger than all the other score (y) and a margin of l_i (y).
3.BPTS
Bpts in the paper is relatively small, slide is quite detailed and Pset2 part of the code is good drops. There are three differences between bpts and the previous traditional BP:
The 1th is to ask W's gradient to sum all of the node's; 2nd I feel is used to update the vector in semantic vector space; 3rd adds an error message:total error messages = Error messages from parent + error message from own score
The improved method of bpts parameters update can be adjusted learning rate or use Subgradient (using Subgradient method in the paper, cs229 also have a SMO method comparable)
Improved version of 4.Recursive nn
The first half of the paragraph is the simplest simple RNN. Finally, a modified version of the SU-RNN (syntactically-untied RNN)
That is, weight different choices based on the type of children.
Finally there is a CV display, which means that RNN for NLP operations and CVS is almost a step-by-step decomposition.
Website:
nlp.stanford.edu
Http://repository.cmu.edu/robotics
www.socher.org
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
CS224D Lecture 9 Notes