[NLP paper Reading] The Fixed-size ordinally-forgetting Encoding method for neural network Language

Last Update:2018-07-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original: The Fixed-size ordinally-forgetting Encoding method for neural network Language Models Introduction

This paper presents a method of learning indefinite long sequence representation, and uses this method for the language model of Feedforward neural Networks (Feedforward neural network language models, Fnn-lms) and obtains good experimental data. The author realizes the improvement of the FNN language model by replacing the fnn-lms of the original input layer with the Fofe coded sequence. fixed-size ordinally Forgetting Encoding

The given thesaurus size (vocabulary size) is represented by the K,fofe using One-hot encoding, each word, a k-dimensional vector, to represent the word. Fofe uses the following formula to encode an indefinite length sequence:
Zt=α∗zt−1+et (1≤t≤t) z_t=\alpha*z_{t-1}+e_t (1\leq T\leq t)
Wherein, ZT z_t represents the Fofe encoding of a subsequence of W1 w_1 from the first word in the input sequence up to the T-word wt w_t (assuming z0=0 z_0=0), Α\alpha forgeting factor (constant), et e_t is the word wt w_t The corresponding one-hot vector.
Then, ZT z_t can be regarded as a vector representation of the sequence w1,w2,..., WT {w_1, w_2, ..., w_t}.
For example, if the glossary is
a=[1,0,0]
b=[0,1,0]
c=[0,0,1]
So, by calculating, you can get
abc=[α2,α,1] {abc}=[\alpha^2, \alpha, 1]
Abcbc=[α4,α+α3,1+α2] {abcbc}=[\alpha^4, \alpha+\alpha^3, 1+\alpha^2]

The Fofe code has 2 better properties:
1. If 0<α≤0.5, then Fofe is unique to any K and T.
2. If 0.5<α<1, then Fofe is unique for most k and T, with only a finite alpha value being the exception. Model

The traditional neural probabilistic language model (Bengio) uses the one-hot vector in the input layer, then the word vector matrix is mapped into a low dimensional real-valued vector (if the N-gram model and the word vector dimension is m), then the word vector corresponding to the first n-1 word is connected to a m (n-1 ), and then the output layer is formed by the computation of the hidden layer.

In this article, the author's changes are made at the input level. The author replaces the One-hot vector of the original output layer with the Fofe encoding. In the process of n-1 a word before using FOFE encoding, the effect of earlier words on the final encoding is gradually weakened, which means that the word closer to the target word is more likely to affect the target word. Also, using FOFE encoding can reduce the dimension of the vector generated in the projection layer, if it is a 1-order Fofe Fnn-lms, then the dimension of the projection layer is m (the dimension of the word vector), but this does not reduce the complexity. Experiment

The authors carried out comparative experiments in 2 datasets.
1. The Penn Treebank (PTB) corpus (about 1000000 words with a thesaurus size of 10000)
2. The Large Text Compression Benchmark (LTCB), in which the author uses the ENWIK9 data set, is the Enwiki-20060303-pages-articles.xml's first 109 10^9 byte data , where the training set of 153M, validation set 8.9M, test set 8.9M, thesaurus size is 80000, not the word in the glossary with < UNK > tag.

The

Generally evaluates a language model for good or bad use of the index is the confusion/confusion/confusion (preplexity), the basic idea is to give the sentence of the test set a higher probability of the language model is better, when the language model training, the test set of sentences are normal sentences, Then the training model is the higher the probability of the test set, the better the , the specific formula is as follows:
PP

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[NLP paper Reading] The Fixed-size ordinally-forgetting Encoding method for neural network Language

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

[NLP paper Reading] The Fixed-size ordinally-forgetting Encoding method for neural network Language

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support