Sentence generation algorithm based on context-independent grammar

Source: Internet
Author: User

Tag: INI around OID references update min based on context-independent grammar weight

Objective

The algorithm comes from a blog from a foreign Daniel: click here to view
The algorithm does not involve any knowledge in the field of artificial intelligence, but is simply the idea of generating sentences for context-independent grammars.

Context-Independent grammar

Context-independent grammars are only related to the structure of a sentence and are not related to contextual semantics.
Properties | words
--|--
S | NP VP
NP | Det N/det N
NP | I/he/she/joe
VP | V NP/VP
Det |a/the/my/his
N |elephant/cat/jeans/suit
V |kicked/followed/shot
The above is an example of a context-independent grammar rule, in which s represents a sentence, starting with s, and gradually filling the word recursively, you can generate a sentence.

Basic implementation
ImportRandom fromCollectionsImportDefaultdictclassCFG (Object):def __init__( Self): Self. prod=Defaultdict (List)# Default Dict value is list, for null key value pairs    defAdd_prod ( Self, LHS, RHS):"" " Add production to the grammar. RHS ' CanBe several productions separated by ' | '.Each production is a sequence of symbolsseparated by whitespace.Usage:Grammar.add_prod (' NT ', ' VP PP ')Grammar.add_prod (' Digit ', ' 1|2|3|4 ')        """Prods=Rhs.split ('|')# by | split         forProdinchProds: Self. Prod[lhs].append (tuple(Prod.split ()))# The default split is split by a space, but the segmentation here is to create a tuple that is added to the Prod    defGen_random ( Self, symbol):"" " Generate a random sentence from theGrammar, starting with the givensymbol.        """Sentence= ''        # Select one production of this symbol randomlyRand_prod=Random.choice ( Self. Prod[symbol])# Randomly Select a phrase from the list of symbols         forSyminchRand_prod:#遍历词组中的单词            # for Non-terminals, Recurse            ifSyminch  Self. PROD:#如果这个位置的单词并不是一个确切的单词, instead of a lexical structure, then recursively select the corresponding qualified wordSentence+=  Self. Gen_random (SYM)Else: sentence+=Sym+ ' '       #如果已经是一个确切的单词, then connect directly to the sentence.        returnSentencecfg1=CFG () Cfg1.add_prod (' S ',' NP VP ') Cfg1.add_prod (' NP ',' Det N | Det N ') Cfg1.add_prod (' NP ',' I | he | she | Joe ') Cfg1.add_prod (' VP ',' V NP | VP ') Cfg1.add_prod (' Det ',' A | | my |) Cfg1.add_prod (' N ',' Elephant | cat | jeans | suit ') Cfg1.add_prod (' V ',' kicked | followed | shot ') forIinch Range(Ten):Print(Cfg1.gen_random (' S '))

Here is a basic Python-based implementation that can be filled by recursion.

Context-independent grammars cause an issue that cannot be terminated

The algorithm above is simple enough to look great. But there is actually a problem that can easily lead to a problem that cannot be terminated.
Properties | expressions
--|--
expr| term + EXPR
expr| term-expr
expr| Term
term| FACTOR * Term
term| Factor/term
term| FACTOR
Factor|id//NUM//(EXPR)
Id|x//y//z//W
Num|0//1//2//3//4//5//6//7//8//9
For example, above is an example of generating an arithmetic expression, the above rules are in accordance with normal mathematical knowledge, but in the process of generating an expression has a problem that cannot be terminated. Expr->term + expr->term + EXPR, like this infinite loop.

Resolve an issue that cannot be terminated

To solve the problem that cannot be terminated, probability generation algorithm can be used.

This refers to the author's original image, since Term-expr's ancestors have used this expression, then the generation of the expression of the probability will be correspondingly reduced, the example of the reduction factor is 0.5, that is used once, then the next use of the expression of the probability of the original 50%.
The above algorithm uses code to implement the following

ImportRandom fromCollectionsImportDefaultdict# probability selection algorithmdefWeighted_choice (weights): rnd=Random.random ()* sum(weights) forI, Winch Enumerate(weights): rnd-=WifRnd< 0:returnIclassCFG (Object):def __init__( Self): Self. prod=Defaultdict (List)# Default Dict value is list, for null key value pairs    defAdd_prod ( Self, LHS, RHS):"" " Add production to the grammar. RHS ' CanBe several productions separated by ' | '.Each production is a sequence of symbolsseparated by whitespace.Usage:Grammar.add_prod (' NT ', ' VP PP ')Grammar.add_prod (' Digit ', ' 1|2|3|4 ')        """Prods=Rhs.split ('|')# by | split         forProdinchProds: Self. Prod[lhs].append (tuple(Prod.split ()))# The default split is split by a space, but the segmentation here is to create a tuple that is added to the Prod    defGen_random_convergent ( Self, Symbol, Cfactor=0.25, Pcount=Defaultdict (int)                              ):"" " Generate a random sentence from theGrammar, starting with the given symbol.Uses a convergent algorithm-productionsThat has already appeared in thederivation on each branch has a smallerchance to be selected.Cfactor-controls how tight theconvergence is. 0 < Cfactor < 1.0Pcount is used internally by therecursive calls to pass on theproductions that has been used in theBranch.        """Sentence= ''        # The possible productions of this symbol is weighted        # By their appearance in the branch that have led to this        # symbol in the derivation        #Weights=[] forProdinch  Self. Prod[symbol]:# Calculate the corresponding build probability for all expressions that meet a requirement            ifProdinchPcount:weights.append (Cfactor**(Pcount[prod]))# for expressions that the parent node has already referred to, you need to reduce the generation probability based on the factor            Else: Weights.append (1.0)#Rand_prod=  Self. Prod[symbol][weighted_choice (weights)]# Select a newly generated expression based on probability        # Pcount is a single object (created        # This method) that's being passed around into recursive        # Calls to count how many times productions has been        # used.        # before recursive calls the count is updated, and after        # The sentence is Rolled-back        # To avoid modifying the parent ' s pcount.        #Pcount[rand_prod]+= 1         forSyminchRand_prod:# for Non-terminals, Recurse            ifSyminch  Self. PROD:# If it's not an exact word, then the recursive fill expressionSentence+=  Self. Gen_random_convergent (Sym, Cfactor=Cfactor, Pcount=Pcount)Else: sentence+=Sym+ ' '  # If it's an exact word, just add it to the back of the sentence        # Backtracking:clear The modification to PcountPcount[rand_prod]-= 1  # because Pcount is a reference value, it needs to revert to its original state        returnSentencecfg1=CFG () Cfg1.add_prod (' S ',' NP VP ') Cfg1.add_prod (' NP ',' Det N | Det N ') Cfg1.add_prod (' NP ',' I | he | she | Joe ') Cfg1.add_prod (' VP ',' V NP | VP ') Cfg1.add_prod (' Det ',' A | | my |) Cfg1.add_prod (' N ',' Elephant | cat | jeans | suit ') Cfg1.add_prod (' V ',' kicked | followed | shot ') forIinch Range(Ten):Print(Cfg1.gen_random_convergent (' S ')) Cfg2=CFG () Cfg2.add_prod (' EXPR ',' term + EXPR ') Cfg2.add_prod (' EXPR ',' term-expr ') Cfg2.add_prod (' EXPR ',' term ') Cfg2.add_prod (' term ',' FACTOR * term ') Cfg2.add_prod (' term ',' Factor/term ') Cfg2.add_prod (' term ',' FACTOR ') Cfg2.add_prod (' FACTOR ',' ID | NUM | (EXPR) ') Cfg2.add_prod (' ID ',' x | y | z | w ') Cfg2.add_prod (' NUM ',' 0|1|2|3|4|5|6|7|8| 9 ') forIinch Range(Ten):Print(Cfg2.gen_random_convergent (' EXPR '))
Summary

Recursion makes it easy to implement algorithms for generating sentences based on context-independent grammars. However, it is important to note that the common algorithm can cause the problem that can not be terminated, in view of this problem, some people put forward the algorithm of sentence generation based on probability, which solves the problem that cannot be terminated very well.

Sentence generation algorithm based on context-independent grammar

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.