How to use vectors to represent document DOC or sentence

Source: Internet
Author: User

1. Introduction to "sentence vectors"
Word2vec provides high-quality word vectors and performs well in some tasks.
For how word2vec works, refer to the following articles:

Https://arxiv.org/pdf/1310.4546.pdf
Https://arxiv.org/pdf/1301.3781.pdf
For how to use a third-party library gensim to train word2vec, refer to this blog:

Http://blog.csdn.net/john_xyz/article/details/54706807
Although word2vec provides high-quality word vectors, there is still no effective way to combine them into a high-quality document vector. How can a sentence, document, or paragraph be projected into a vector space with rich semantic expressions? In the past, people often used the following methods:

Bag of words
LDA
Average word Vectors
TFIDF-weighting word Vectors
Bag of words has the following Disadvantages: 1. The order of words is not taken into account; 2. the semantic information of words is ignored. Therefore, this method has a poor effect on short text. It has a general effect on long text and is usually used as a baseline in scientific research.

Average word vectors simply average all word vectors in a sentence. Is a simple and effective method, but the disadvantage is that the order of words is not taken into account.

TFIDF-weighting word vectors is a common method for calculating sentence embedding based on the weighted sum of all word vectors in a sentence based on the TFIDF weight, compared to simply finding the mean for all word vectors, considering the TFIDF weight, the more important words in a sentence occupy a larger proportion. But the disadvantage is that the order of words is not taken into account.

The LDA model calculates the topic distribution of a document or sentence. It is often used for text classification tasks. I will write an article later to introduce the essential differences between the LDA model and doc2vec.
---------------------
Johnson0722
Source: csdn
Original: 79208564
Copyright Disclaimer: This article is an original article by the blogger. For more information, see the blog post link!

How to use vectors to represent document DOC or sentence

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.