Word2vec is a toolkit launched by Google in 2013 by open source for obtaining word vector. It is simple and efficient, so it has aroused the attention of many people. Tomas Mikolov, the author of word2vec, did not talk about too many algorithm details in two related papers [3, 4], thus increasing the mystery of this toolkit to a certain extent. Some people who couldn't help but choose to take a look at the source code through anatomy.
The first contact with word2vec was in February. At that time, I studied the paper published by Zheng Xiaoqing from Fudan University [7]. My main job was to use the senna algorithm ([8]). move to Chinese scenarios. I think it is quite interesting, so I made an implementation (see [20]). However, because the training time of word vectors is too long, I chose to use word2vec to provide word vectors, I did not expect that the Chinese word segmentation effect was good. I immediately gave a look at word2vec, and my curiosity grew.
Later, I saw some specific applications of word2vec, And the Tomas Mikolov team also promoted them to sentences and documents ([6]). therefore, it is indeed necessary to understand the algorithm principles in word2vec in order to track their follow-up research. As a result, I read the code carefully, so I basically understood the practice. The first feeling is: "It is a very simple, superficial structure. Why is it said that it is deep learning by so many people ?"
In the process of anatomy of the word2vec source code, in addition to the algorithm-level gains, in fact, there are also a lot of gains in programming skills. Since it took a lot of time to read the code, I still need to write down what I understood and provide some reference to friends in need.
In the process of finishing this article, contact the Group of Friends in the Deep Learning Group @Northbrook prodigal son([]) I would like to express my gratitude for the many useful discussions. In addition, I have also referenced some of other people's documents and listed them in the references. I would like to express my gratitude for their work.
Peghoty
Source: http://blog.csdn.net/itplus/article/details/37969519
You are welcome to repost/share the article, but be sure to declare the source of the article.