A brief analysis of Word2vec

Source: Internet
Author: User

Word2vec is Google's Open Source Toolkit, published in 2013, that can be used to vector word. Principle is as follows

A detailed explanation of the mathematical principles in Word2vec (i) Catalogue and preface


In simple terms:

In order to achieve the article or a passage of emotional analysis, there are several ways:

1. Simple divided into positive feelings and negative feelings, such as good on + 1, bad 1

2. Using bags of words, the term is considered independent, with the disadvantage of not considering contextual links

3. Using Word2vec, consider context

This method can compress the data scale while capturing the contextual information. Word2vec is actually two different methods:continuous Bag of Words (Cbow) and Skip-gram. The goal of Cbow is to predict the probability of the current word based on context. Skip-gram just the opposite: the probability of predicting the context based on the current word. Both of these methods use artificial neural networks as their classification algorithms. At first, each word is a random N-dimensional vector. After training, the algorithm obtains the optimal vector of each word using Cbow or Skip-gram method.

Reference

Source Documents < http://www.open-open.com/lib/view/open1444351655682.html >

Among them are the emotional analysis of emoji: The 40,000 tweets are divided into two types of optimism and pessimism, Word2vec converted into 300-D vectors and 8/2-point logistic regression training.

So the general step of using Word2vec is to have a large number of text, such as Baidu Encyclopedia, Wikipedia encyclopedia, the text on the news, composing TXT document;

The second step is to use Word segmentation tool to text segmentation;

The third step, the result of Word segmentation with Word2vec do training, unsupervised training of the word vector.

Therefore, the larger the text volume, the more authoritative the word vector will be more reasonable, can be explained.

Example:

1. Use Word tool ANSJ and WORD2VEC training news data

http://www.ppvke.com/Blog/archives/44422

Take shortcuts by using Wikipedia's text in Chinese:

a good training of Chinese word vectors http://www.cnblogs.com/Darwin2000/p/5786984.html


Another:

http://download.csdn.net/download/eastmount/9434889


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.