Welcome reprint, Reprint annotated Source:
Http://www.cnblogs.com/NeighborhoodGuo/p/4728185.html
Finally the last talk also finished, Stanford NLP Course also close to the end, really is very happy, this course really let me harvest a lot.
This lesson is the DL in the application of NLP, in fact, most of the content in the previous class and before the recommended reading has been mentioned, this lesson is a refresher lesson.
The same first overview a bit: 1. Model Overview 2.Character Rnns on text and code 3.Morphology 4.Logic 5.q&a 6.image-sentence Mapping
Model Overview
The teacher strongly recommended glove in class.
The dimensions of Word vectors often determine the number of parameters for a model
Phrase Vector composition is mainly represented by averaging, Recursive neural networks, convolutional neural networks, recurrent neural NE Twork
Many of these recursive functions are MV-RNN variants.
The parsing tree is divided into three main types: the first is that constituency tree has an advantage in capturing syntactic structure. The second is that depenency tree has an advantage in capturing semantic structure.
The third is the balanced tree, which is very similar to CNN
There are three types of Objective function: The first is the Max-margin second is the cross-entropy third is auto-encoder (this kind of use of NLP is not clear, so in class did not say)
Optimization is divided into two main categories: the first category is optimization algorithm, SGD, SGD + momentum, L-BFGS, Adagrad, Adelta
The second major category is optimization tricks, regularization, dropout
Morphology
Some words in English have a standard root, and this root can also derive many derivative words. In some cases, the lexical frequency of the root is particularly high, and the frequency of derivative words appears relatively low.
This results in a more accurate representation of the root of the model, but the representation of the derived term is blurred.
Therefore, based on this problem, there is a kind of improvement measures for the model.
For derived terms it is parsing to generate a root-based tree and then use the root and prefix suffix, which links the derivation with the root.
Logic
The main purpose is to identify the following content:
The model used is still RNN, and the model is more similar than that.
Q&a
This part of the lesson comes from a previous recommendation paper that the computer will be able to talk to people later, which is the more complete version of Siri on Apple's phone.
There is the computer can participate in similar to the Lucky 52, Happy Dictionary program, but also can do better than people. This is going to be awesome.
Image-sentence Mapping
The latest research results are made by Professor Li Feifei. It means that a picture is put over, and the computer can describe the contents of the image.
The simple method is to project the picture and the sentence into the same vector space, and when a picture appears, apply the Euclidean distance to find the appropriate description for the last few sentences. Conversely, you can do image retrieval.
However, the resulting sentences are limited sentences, the computer can not "describe", so there is an improved version.
First, use the CNN model to project the image into a vector, and then use LSTM to generate the sentence. This is a bit like a machine translation, just replaced the source language with an image
Finally, the evaluation method for this model (evaluation) is named mean rank
When you create a picture for a sentence, you may generate many sentences, some of which are incorrect.
Put all the sentences generated in the order of relevance from the big to the small, and then record the correct sentence of rank to ask mean is mean rank
Of course the smaller the better!
CS224D Lecture 16 Notes