002-word vector, neural network model, Cbow, Huffman tree, negative sampling

Last Update:2018-10-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Word vectors:

Whether it is a passage or an article, the word is the most basic constituent unit.

How to make computers use these words?

The point is how to convert a word into a vector

If in a two-dimensional space, had,has,have meaning is the same, so to be closer.

Need,help is very close to the same location.

To show the same, related.

Let's say the following example:

Which words are closer to the Frog frog? Synonyms

For two different languages, the language space is also very close after modeling,

So it can be said that the constructed word vectors are not related to the language category, but are modeled based on the semantic loop (context logic).

Neural Network model:

The input word vectors are connected to the end-to-end (projection layer), and the parameters are optimized in the transmission to neural networks.

The input vectors here also need to be optimized.

Training samples: Includes vectors of the first n-1 words, assuming that each word has a vector size m

Projection layer: (n-1) *m large vector

Output:

Indicates the context, the next word is exactly the probability of the first word in the dictionary

Normalization:

The goal is to ask what the word vector is for each word.

Advantages of neural Networks:

S1 = "I went to internet café today" 1000 times.
S2 = "I went to the Internet café today" 10 times

For N-gram models: P (S1) >> p (S2)
and the neural network model calculates the P (S1) ≈p (S2)

Neural network It seems that similar sentences and words are a thing

As long as there is one in the corpus, the probability of the other sentences will increase correspondingly.

Hierarchical Softmax:

Layered Softmax

Cbow: Gets the current word according to the context

Skip-gram: Gets the context based on the current word.

Cbow:

Cbow is the abbreviation of continuous bag-of-words model, which predicts the probability of the occurrence of the current word based on the contextual words.

If the context is present, the word w we want it to appear the probability that the bigger the better

We need to get to know a thing called Huffman Tree first.

Huffman Tree

Equal to the weight multiplied by the step length, the maximum weight of the first place, in the Word2vec, we can take the word frequency (probability) as the weight value.

This two classification can do Softmax stratification judgment, judging is not the following words to appear, and then put important in 1th place, 2nd bit ...

The construction flow of Huffman tree

Using Huffman Tree Coding:

a:111

c:110

B:10

d:0

In Huffman tree, how to decide the direction? (Decide about)

Using previous knowledge: Logistic regression

sigmoid function

Any numeric input to get the output of the 0~1, then you can classify the output to the left or to the right

Then say the cbow of the previous article

The input layer is the word vector of the context word, in the training Cbow model, the word vector is just a byproduct, and, to be exact, a parameter of the Cbow model. At the beginning of the training, the word vector is a random value, which is updated as the training progresses.
The projection layer sums it up, and the so-called summation is simply a vector addition.
The output layer outputs the most probable W. Because the vocabulary in the corpus is fixed | C|, so the above process can actually be regarded as a multi-classification problem. Given characteristics, from | Pick one of the c| categories.

If I finally need to get the word football, then the process is:

How to Solve:

Target function:

The bigger the better.

The maximum value is the problem of asking for a gradient rise.

Because it is linearly correlated with vectors and each word vector, the pair and vector updates can be applied to every word vector.

Skip-gram:

Also need to consider a problem, if the corpus is very large, that is, the use of Huffman tree, the common top of the line, then there are many unusual rows in the back, so that the computational complexity becomes very large.

One solution is called negative sampling (negative sampling):

We want to maximize the likelihood that the predictions will be right.

The meaning of multiplicative is that all the words can be predicted.

The desired value is the same, but is described by another method, the former is through the Huffman tree, now is the interval value.

Last updated word vector

002-word vector, neural network model, Cbow, Huffman tree, negative sampling

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

002-word vector, neural network model, Cbow, Huffman tree, negative sampling

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

002-word vector, neural network model, Cbow, Huffman tree, negative sampling

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support