CS224D Lecture 10 Notes

Source: Internet
Author: User
Tags decode all

Welcome reprint, Reprint annotated Source:

Http://www.cnblogs.com/NeighborhoodGuo/p/4702932.html

Go Go Go

The tenth lecture also successfully concluded, is worthy of the advanced Recursive nn said content is indeed some advanced, but if serious lectures and then seriously see after class paper words, I believe can still fully understand.

Start summarizing ...

First of all, the teacher in class began to talk about the rnn of a review, but also use RNN for NLP three large elements: 1.Train Objective There are two main, one is cross-entropy or Max-margin 2.Composition Function, this talk is mainly about this content 3.Tree structure this in the previous talk in detail, chain structure or balance Tree

This speaks of four models:

The application of the second Matrix-vector Rnns in relation classification was not seen in paper, but only in the course. Therefore, this talk is mainly about the application of each model in paraphrase detection and sentiment analysis.

All right, from the top down (--)

1.Standard Rnns

The Rnns for paraphrase detection consists of two main aspects. The first aspect is recursive Autoencoder, and the second is the neural Network for variable-sized Input

This model is very detailed in the second article paper.

1.Recursive Autoencoder

First we have a parse tree, and a trusted parse tree is important for paraphrase detection.

There are two ways to Autoencoder Recursive Autoencoder.

The first is the left side of this, each time decoder only decoder out a layer, and then ask all non-terminal nodes error and as loss function. The error of Non-terminal nodes is that its two children vectors is connected first, then the European distance is obtained.

In the above two, where the T set in the following equation is all non-terminal nodes, the C1 in the above equation, C2 is the two children of P respectively

Since the value of the non-terminal nodes can be achieved by infinitely shrinking the norms of the hidden layers, the Non-terminal nodes value P must be normalization

The second is the one on the right side of the picture above reconstruct the entire spanned subtree underneath each nodes

is to decode all the leaf nodes, and then connect all the leaf nodes, and ask the Euclidean distance as the loss function.

2.Neural Network for variable-sized Input

The model tune above is good, then the next stage.

These are the tune tree.

First set up a similarity matrix, where column and row are sorted in the left-to-right order of the words and the upper hidden nodes from bottom to top from right to left

The second step is pooling, where the pooling layer is a square matrix that first takes a fixed value for #col and #row and sets it to n_p. The non-over-lapping method used in this paper is to not overlap rows or columns.

If #col > N_p, #row > N_p, every #col/n_p and #row/n_p as a pool will end up with a smaller pool row or column smaller than n_p.

If #col < N_p, #row < N_p, first duplicating pixels is less than n_p, and then until pixels on that side is greater than n_p.

Take its minimum value in each pool, then normalize each entry after the pool so that it mean = 0 and variance = 1

Paper mentions an improved method for numbers: First, if the number in the two sentences is exactly the same or no number, set to 1, and vice versa to 0; the second if two sentences contain the same number, set to 1; The number in the third sentence is strictly a subset of the numbers in the other sentence, set to 1

There are two drawbacks to this approach: the first is to simply compare the similarity of words or phrase and lose the grammatical structure, and the second is to calculate the similarity matrix and omit some of the information.

Finally, the obtained similarity matrix input into an NN or softmax classifier to establish the loss function can be optimized calculation.

2.matrix-vector Rnns

Matrix-vector Rnns This model is relatively simple is in the expression of a word, not only in the form of vectors, with the matrix vector combination of the way to represent a word

The above is the Matrix-vector expression method, and the Stardard difference is relatively small.

In class, this model has a good effect on relationship classification.

Relationship classification simply says it's like a former high school that extracts keywords from a sentence.

3.RNTN

Bag-of-words Way to sentiment detection comparison is not reliable, because bag-of-words can not capture a sentence of the parse tree and linguistic features

The use of good corpus will also improve accuracy, very tempting oh!

In fact, the overall model changes are not too big, but also very good understanding, so that can be very good to capture the sentence of the sentiment

The optimization of this model is slightly different from the previous one:

This model is said to be the only model that can capture negation and its scope today.

4.Tree Lstms

The difference between the tree Lstms and the ordinary Lstms is that the tree Lstms is modeled Lstms from the tree's structure.

Ordinary Lstms can also be seen as a special case of tree Lstms.

The hidden calculation of leaf node in Tree Lstms is the same as that of the previous normal hidden calculation, except that its parents calculation is slightly different. See the concrete formula.

The hidden of the parent is the sum of its children hiddens, and each forget unit is calculated from a specific node, and the final cell is calculated by multiplying and summing all forget units and the corresponding cells. The other is the same calculation method as the ordinary Lstms.

This model is most suitable for semantic similarity at present.

CS224D Lecture 10 Notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.