Mathematical Principles in word2vec

Source: Internet
Author: User

Word2vec is a toolkit launched by Google in 2013 by open source for obtaining word vector. It is simple and efficient, so it has aroused the attention of many people. Tomas Mikolov, the author of word2vec, did not talk about too many algorithm details in two related papers [3, 4], thus increasing the mystery of this toolkit to a certain extent. Some people who couldn't help but choose to take a look at the source code through anatomy.

The first contact with word2vec was in February. At that time, I studied the paper published by Zheng Xiaoqing from Fudan University [7]. My main job was to use the senna algorithm ([8]). move to Chinese scenarios. I think it is quite interesting, so I made an implementation (see [20]). However, because the training time of word vectors is too long, I chose to use word2vec to provide word vectors, I did not expect that the Chinese word segmentation effect was good. I immediately gave a look at word2vec, and my curiosity grew.

Later, I saw some specific applications of word2vec, And the Tomas Mikolov team also promoted them to sentences and documents ([6]). therefore, it is indeed necessary to understand the algorithm principles in word2vec in order to track their follow-up research. As a result, I read the code carefully, so I basically understood the practice. The first feeling is: "It is a very simple, superficial structure. Why is it said that it is deep learning by so many people ?"

In the process of anatomy of the word2vec source code, in addition to the algorithm-level gains, in fact, there are also a lot of gains in programming skills. Since it took a lot of time to read the code, I still need to write down what I understood and provide some reference to friends in need.

In the process of finishing this article, contact the Group of Friends in the Deep Learning Group @Northbrook prodigal son([]) I would like to express my gratitude for the many useful discussions. In addition, I have also referenced some of other people's documents and listed them in the references. I would like to express my gratitude for their work.

 

 

 

 

 

Peghoty

Source: http://blog.csdn.net/itplus/article/details/37969519

You are welcome to repost/share the article, but be sure to declare the source of the article.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.