Recently tried Word2vec, GloVe and the corresponding Python version Gensim Word2vec and Python-glove, the intention is to test on a larger corpus, the natural Wikipedia corpus entered the line of sight. Wikipedia official provides a very good Wikipedia data source: https://dumps.wikimedia.org, you can easily download a variety of languages in various formats of Wikipedia data. Before using Gensim's English Wikipedia corpus and training the Lsi,lda model to calculate the similarity of two documents, I wanted to see if Gensim provided an easy way to work with Wikipedia data, train Word2vec models, and calculate the semantic similarity between words. Thanks to Google, Gensim Google Group, found a very long discussion post: training Word2vec on the full Wikipedia, This post basically explains how to use Gensim to train the Word2vec model on Wikipedia corpus, and even the author of the Gensim involved in the discussion Radim eh?? Dr. Ek also added a bit of correction to the new Gensim version, and for me, the job was to do the verification. Although there is a Wiki2vec project on GitHub, I prefer to use Python gensim to solve the problem.
About Word2vec, this aspect regardless of the Chinese and English reference material is quite many, English aspect both can look at the official recommendation paper, may also see Gensim author Radim? Dr. Ek wrote some articles. In terms of Chinese, it is recommended to @licstar "deep learning in NLP (a) Word vector and language model", Youdao Technology salon "deep learning Combat Word2vec", @ Fei Lin sha "Word2vec Learning ideas", falao_ Beiliu's "The foundation of Deep learning Word2vec notes" and "The algorithm of deep learning Word2vec notes" and so on.
Continue reading →
published in natural language processing, semantic similarity, language models | tagged Gensim, gensim word2vec, glove, Mecab, Python gensim, Python glove, Python word2vec, Word2vec, Word2vec experiment, Word2ve C application, Word2vec model, Word2vec similarity, Word2vec similarity calculation, Word2vec word similarity, Chinese word segmentation, Chinese Simplified conversion, Wikipedia corpus, Chinese code conversion, document similarity, deep learning, similarity, wikipedia, Wikipedia corpus, English Wikipedia corpus, similarity of words, semantic similarity, language model | a comment Hmm related articles Indexposted on March 7, 2015 by 52nlp
Hmm series of articles is a high number of articles on the 52NLP, here to do an index, convenient for everyone to reference.
Hmm learning
- Hmm learning best Example one: Introduction
- Best example of HMM Learning II: generative mode
- Best example of hmm learning three: hidden mode
- The best example of hmm learning four: Hidden Markov model
- The best example of hmm learning five: forward algorithm
- The best example of hmm learning five: forward algorithm 1
- The best example of hmm learning five: forward algorithm 2
- The best example of hmm learning five: forward algorithm 3
- The best example of hmm learning five: forward algorithm 4
- The best example of hmm learning five: forward algorithm 5
- Hmm learning best Practices six: Viterbi algorithm
- Hmm learning best Practices Six: Viterbi algorithm 1
- Hmm learning best Practices six: Viterbi algorithm 2
- Hmm learning best Practices Six: Viterbi algorithm 3
- Hmm learning best Practices Six: Viterbi algorithm 4
- Hmm learning best Practices Six: Viterbi algorithm 5
- The best example of hmm learning seven: forward-back algorithm
- Best example of hmm learning seven: forward-back Algorithm 1
- Best example of hmm learning seven: forward-back algorithm 2
- Best example of hmm learning seven: forward-back algorithm 3
- Best example of hmm learning seven: forward-back Algorithm 4
- Best example of hmm learning seven: forward-back algorithm 5
- Best example of hmm learning eight: a summary
- Hmm learning best Practices full-text document PDF Baidu Network disk-password F7az
Hmm correlation
- A better hmm example on a wiki
- Hmm versions of several different programming languages
Hmm applications
- Hmm pos Labeling
- Application of hmm in natural language processing one: POS Callout 1
- Application of hmm in natural language processing one: POS Callout 2
- Application of hmm in natural language processing one: POS Callout 3
- Application of hmm in natural language processing one: POS Callout 4
- Application of hmm in natural language processing one: POS Callout 5
- Application of hmm in natural language processing one: POS Callout 6
- Hmm Chinese participle
- ITENYH version-using HMM to do Chinese participle one: Preface
- ITENYH version-using HMM to do Chinese participle two: Model preparation
- ITENYH version-using HMM to do Chinese word segmentation three: the overhead of the forward algorithm and the Viterbi algorithm
- ITENYH version-using HMM to do Chinese word segmentation four: A pure-hmm word breaker
- ITENYH version-use hmm to do Chinese word five: a mixed word breaker
I love natural language processing [turn]