"Attention are All" Need (Introduction + code)

Source: Internet
Author: User

In 2017 years, there are two articles similar to the author's very appreciation, respectively, Facebook's "convolutional Sequence to Sequence Learning" and Google's "Attention are all you Need" , they are all seq2seq on the innovation, essentially, are abandoned RNN structure to do seq2seq tasks.

this blog post, the author of "Attention is" all you Need do a little simple analysis. Of course, the two papers themselves are relatively fire, so there have been a lot of interpretation of the online (but many of the interpretation is direct translation of the paper, very few of their own understanding), so here as many of their own text, as far as possible do not repeat the Internet, the big guys have said. I. Sequence encoding

The depth of learning to do NLP is basically the first sentence participle, and then each word into the corresponding word vector sequence. Thus, each sentence corresponds to a matrix x= (x1,x2,..., xt) x= (x1,x2,..., xt), where Xi Xi all represents the word vector (row vector) of the I I word, and the dimension is D D, so x∈rnxd x∈rnxd. In this case, the problem becomes the encoding of these sequences.

The first basic idea is the RNN layer, the RNN scheme is simple, recursive:
Yt=f (YT−1,XT) yt=f (YT−1,XT)
The lstm, GRU, or recent SRU, which have been widely used, have not been divorced from this recursive framework. RNN structure itself is relatively simple and suitable for sequence modeling, but one of the obvious drawbacks of RNN is that it cannot be parallel, so it is slow, which is a natural defect of recursion. In addition, I personally feel that RNN can not learn the overall structure of information, because it is essentially a Markov decision-making process.

The second idea is the CNN layer, in fact, CNN's program is also very natural, window-type traversal, such as the size of 3 convolution, is

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.