Assume that each word corresponds to a word vector. Suppose:
1) similarity between two words is proportional to the product of the corresponding word vector. That is, $ sim (v_1, v_2) = v_1 \ cdot v_2 $. The dot multiplication principle;
2) multiple words $ v_1 ~ A context consisting of v_n $ is represented by $ C $, where $ C = \ sum _ {I = 1} ^ {n} v_ I $. That is, the addition principle;
3) the probability of the word $ A $ appears in the context $ C $ is proportional to the Energy Factor $ e ^ {-E (A, C )}, where E =-A \ cdot C $. That is, the energy rule (see the scoring function in the heat system ).
Therefore:
\ [P (A | C) = \ frac {e ^ {-E (A, C )}} {\ sum _ {I = 1} ^ Ve ^ {-E (v_ I, C )}} =\ frac {e ^ {A \ cdot C }}{\ sum _ {I = 1} ^ Ve ^ {v_ I \ cdot C }}\]
$ V $ indicates the entire vocabulary space.
How word2vec generates word vectors