The code path for the simple demo is in tensorflow\tensorflow\g3doc\tutorials\word2vec\word2vec_basic.py
Model thinking of sikp gram mode
Http://tensorflow.org/tutorials/word2vec/index.md
You can also refer to the cs224d course courseware.
??
the window is set to the left and right 1 words
corresponding to the skip Gram model is a word predicting its surrounding words (the Cbow model is to enter a series of context words to predict a central word )
??
Quick--The quick brown
Skip Gram 's training target cost function is
Corresponding
but it's too time-consuming. the cost of training every step of the time is O (Vocabularysize)
so we used the NCE (noise-contrastive estimation) approach , which is negative sample sampling, in some way randomly generated words as negative samples, such as Quick-sheep ,sheep as a negative sample, suppose we take a negative sample
??
- input data here is delimited words
- read in Word store to list
- statistic word frequency 0 location to unknown, unknown Gets the default dictionary size For example 50000 50000 unknown
set up key->id id->key bidirectional index map< Span style= "font-family: Microsoft Jas Black" >
4. generating a set of training batch
Batch_size = 128
Embedding_size = Dimension of the embedding vector.
Skip_window = 1 # How many words to consider left and right.
Num_skips = 2 # How many times to reuse an input to generate a label.
??
Batch_size the size of the data scanned per SGD training, the size of the embedding_size word vector, the size of the Skip_window window ,
Num_skips = 2 indicates that input uses the limit of the number of times the label is generated
The default in demo is 2, can be set to 1 contrast
By default when you are 2
Batch, labels = Generate_batch (batch_size=8, num_skips=2, skip_window=1)
For I in range (8):
Print (Batch[i], '-a ', labels[i, 0])
Print (Reverse_dictionary[batch[i]], '---', reverse_dictionary[labels[i, 0])
??
Sample data [5239, 3084, 12, 6, 195, 2, 3137, 46, 59, 156]
3084 -5239
Originated-anarchism
3084 -12
originated as
6
A
3084
Originated
6-195
Term A-
6-12
A-as
195-2
term, of
195-6
Term a
3084 left 2 times , corresponding to the window around 1
When set to 1
Batch, labels = Generate_batch (batch_size=8, num_skips=1, skip_window=1)
For I in range (8):
Print (Batch[i], '-a ', labels[i, 0])
Print (Reverse_dictionary[batch[i]], '---', reverse_dictionary[labels[i, 0])
??
Sample data [5239, 3084, 12, 6, 195, 2, 3137, 46, 59, 156]
3084 -12
originated as
3084
Originated
6-12
A-as
195-2
term, of
2-3137
of abuse
3137-46
Abuse-First
59
First--Used
156
3084 left only 1 times
??
??
# Step 4:function to generate a training batch for the Skip-gram model.
def generate_batch (Batch_size, Num_skips, Skip_window):
Global Data_index
Assert batch_size% num_skips = = 0
Assert num_skips <= 2 * Skip_window
Batch = Np.ndarray (Shape= (batch_size), Dtype=np.int32)
Labels = Np.ndarray (shape= (batch_size, 1), Dtype=np.int32)
span = 2 * skip_window + 1 # [Skip_window Target Skip_window]
Buffer = Collections.deque (Maxlen=span)
For _ in range (span):
Buffer.append (Data[data_index])
Data_index = (data_index + 1)% len (data)
For I in range (batch_size//num_skips):
target = Skip_window # target label at the center of the buffer
Targets_to_avoid = [Skip_window]
For j in Range (Num_skips):
While Target in targets_to_avoid:
target = Random.randint (0, span-1)
Targets_to_avoid.append (target)
Batch[i * num_skips + j] = Buffer[skip_window]
Labels[i * num_skips + j, 0] = Buffer[target]
Buffer.append (Data[data_index])
Data_index = (data_index + 1)% len (data)
return batch, Labels
??
Batch, labels = Generate_batch (batch_size=8, num_skips=2, skip_window=1)
For I in range (8):
Print (Batch[i], '-a ', labels[i, 0])
Print (Reverse_dictionary[batch[i]], '---', reverse_dictionary[labels[i, 0])
??
??
It 's about a central word . randomly selects num_skips words in the window range , producing a series of
(input_id, output_id) as a (batch_instance, label)
These are all positive samples.
??
Training preparation,
Input Embedding W
??
??
Output Embedding w^
??
The following code is easier to understand,TF defines the Nce_loss to automatically process, each time will automatically add random negative samples
num_sampled = # of negative examples to sample.
??
Graph = tf. Graph ()
??
With Graph.as_default ():
??
# Input data.
Train_inputs = Tf.placeholder (Tf.int32, shape=[batch_size])
Train_labels = Tf.placeholder (Tf.int32, shape=[batch_size, 1])
Valid_dataset = Tf.constant (Valid_examples, Dtype=tf.int32)
??
# Construct the variables.
embeddings = tf. Variable (
Tf.random_uniform ([Vocabulary_size, Embedding_size],-1.0, 1.0))
Nce_weights = tf. Variable (
Tf.truncated_normal ([Vocabulary_size, Embedding_size],
STDDEV=1.0/MATH.SQRT (embedding_size)))
nce_biases = tf. Variable (Tf.zeros ([vocabulary_size]))
??
# Look up embeddings for inputs.
Embed = Tf.nn.embedding_lookup (embeddings, train_inputs)
??
# Compute The average NCE loss for the batch.
# Tf.nce_loss automatically draws a new sample of the negative labels each
# time we evaluate the loss.
Loss = Tf.reduce_mean (
Tf.nn.nce_loss (nce_weights, nce_biases, embed, Train_labels,
num_sampled, Vocabulary_size))
??
# Construct the SGD optimizer using a learning rate of 1.0.
Optimizer = Tf.train.GradientDescentOptimizer (1.0). Minimize (loss)
??
The training process uses the multiplication of the embedding matrix to calculate the Euclidean distance of different word vectors and calculates the nearest word display for the corresponding distance of several words in high frequency .
??
finally call Skitlearn 's tsne module to reduce the dimension to 2 yuan, drawing display.
??
TensorFlow's Word2vec Demo analysis