Texygen Practice of text Generation unified framework

Source: Internet
Author: User
Tags generative adversarial networks

Text generation is an advanced stage of natural language comprehension and one of the important means to realize human intelligence. Geek.ai in AAAI2018 launched the Leakgan, and finally launched the Texygen this open source text generation framework. Since I wanted to take a closer look at Leakgan, this can be achieved by texygen this framework to achieve a direct implementation of all text generation models in recent years.

The models currently supported are as follows:

implemented Models and Original Papers

Seqgan-seqgan:sequence generative adversarial Nets with Policy Gradient

Maligan-maximum-likelihood augmented discrete generative adversarial Networks

Rankgan-adversarial Ranking for language generation

Leakgan-long Text Generation via adversarial Training with leaked information

Textgan-adversarial Feature Matching for Text Generation

Gsgan-gans for sequences of discrete Elements with the Gumbel-softmax distribution

From Seqgan, Leakgan, Textgan and so on all covered inside. Gan is an important method to realize unsupervised learning and sample generation, and the combination of Gan and NLP to achieve text generation is also a natural point of entry. Gan's success has stimulated people's interest in the research of text discrete data antagonistic training. For example, sequence generation against network Seqgan is one of the early attempts to solve the discrete optimization of the original GAN objective function using the reinforce algorithm. Since then, the researchers have proposed a number of improved Seqgan methods to further improve Seqgan performance, such as gradient disappearance (Maligan, Rankgan, bootstrap reactivation for Leakgan use), and robustness when growing text (Leakgan).

The framework such as Seqgan is as follows:


The schematic framework of the Leakgan is as follows:


The Texygen framework implements a synthetic abstraction of all Gan in a derived manner.


In addition, it is important that Texygen provides a multi-text evaluation indicator system, which includes 5 text-generated assessment indicators, mainly as follows:

Metrics based on document similarity. The most intuitive evaluation indicator of the resulting document quality is how similar the document is to natural language or training datasets:

BLEU: Evaluation index based on the bag of words model. Use words and phrases as the basic unit.

Embsim: The evaluation index is defined by using the model output sequence to train the mutual similarity characteristics of the word vectors. The base Word element (token) is the basic unit.

Indicators based on likelihood (likelihood):

Nll-oracle: Estimation of likelihood based on artificial data. The negative logarithm likelihood of the output of the language model to be measured is measured by the constructed artificial data model.

Nll-test: The likelihood estimation based on the test data. Measure the negative logarithm likelihood of the test data under the measurement of the language model to be evaluated.

Indicators based on diversity evaluation:

Self-bleu: Evaluation index based on the bag of words model. Measure the similarity of each output of a model to the other outputs of this model. Use words and phrases as the basic unit.


2. Practice Training

Run only with Leakgan training.


You can see that in each epoch, the value of the evaluation is calculated.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.