Background
How to represent the semantics that a word contains.
Apple. Fruit. Iphone.
Apple, pear, these two words related.
The expression of a language
Symbolism: Bags-of-word, high dimension, too sparse, lack of semantics, simple model
Distributed representation: Word embedding, low dimension, more dense, inclusive semantics, training complex
Word Embedding
Core ideas: Semantically related words that have similar contextual contexts, such as apples and pears
The thing to do: train each word into a word vector
Practice
Based on Gensim package and Chinese wiki corpus
Gensim:http://radimrehurek.com/gensim/models/word2vec.html
Chinese Wikipedia: Link https://pan.baidu.com/s/1qXKIPp6 password Kade
# Load Package from
gensim.models import Word2vec from
Gensim.models.word2vec import linesentence
# Training Model
sentences = linesentence (' wiki.zh.word.text ')
model = Word2vec (sentences, size=128, window=5, min_count=5, workers =4)
# Save Model
model.save (' word_embedding_128 ')
# load Model
= Word2vec.load ("word_embedding_128")
# Use model
items = model.most_similar (U ' China ')
model.similarity (U ' man ', u ' woman ')