Sampling Introduction
If we have a multi-classification task or a multi-label classification task, given a training set (Xi,ti) (x_i,t_i), where Xi x_i represents the context, Ti t_i represents the target category (there may be multiple). You can use the Word2vec negtive method in sampling for example, Use the Cbow method, which is to use context Xi x_i to predict the central word (a single target ti t_i), or use the Skip-gram method, which uses the central Word XI x_i to predict the context (multiple target (Ti t_i)).
We want to learn a general function f (x, Y) f (x, Y) to characterize the relationship between the context x x and the target class Y Y, such as Word2vec, using the context to predict the probability of the next word.
Complete training methods, such as using Softmax or logistic regression, need to calculate the probability of all classes y∈l y \in L for each training data f (x, Y) f (x, y), when | l| | When the l| is very large, training will be very time consuming.
The "Candidate sampling" training method consists of constructing a training task for each training data (Xi,ti) (x_i,t_i), which allows us to evaluate f (x, Y) f (x, y), typical, by using a smaller candidate set ci∈l c_i \in L. Candidate Set Ci C_i contains the target class Ti t_i and some random sampled categories Si∈l s_i \in l:c