Tag: INI around OID references update min based on context-independent grammar weight
Objective
The algorithm comes from a blog from a foreign Daniel: click here to view
The algorithm does not involve any knowledge in the field of artificial intelligence, but is simply the idea of generating sentences for context-independent grammars.
Context-Independent grammar
Context-independent grammars are only related to the structure of a sentence and are not related to contextual semantics.
Properties | words
--|--
S | NP VP
NP | Det N/det N
NP | I/he/she/joe
VP | V NP/VP
Det |a/the/my/his
N |elephant/cat/jeans/suit
V |kicked/followed/shot
The above is an example of a context-independent grammar rule, in which s represents a sentence, starting with s, and gradually filling the word recursively, you can generate a sentence.
Basic implementation
ImportRandom fromCollectionsImportDefaultdictclassCFG (Object):def __init__( Self): Self. prod=Defaultdict (List)# Default Dict value is list, for null key value pairs defAdd_prod ( Self, LHS, RHS):"" " Add production to the grammar. RHS ' CanBe several productions separated by ' | '.Each production is a sequence of symbolsseparated by whitespace.Usage:Grammar.add_prod (' NT ', ' VP PP ')Grammar.add_prod (' Digit ', ' 1|2|3|4 ') """Prods=Rhs.split ('|')# by | split forProdinchProds: Self. Prod[lhs].append (tuple(Prod.split ()))# The default split is split by a space, but the segmentation here is to create a tuple that is added to the Prod defGen_random ( Self, symbol):"" " Generate a random sentence from theGrammar, starting with the givensymbol. """Sentence= '' # Select one production of this symbol randomlyRand_prod=Random.choice ( Self. Prod[symbol])# Randomly Select a phrase from the list of symbols forSyminchRand_prod:#遍历词组中的单词 # for Non-terminals, Recurse ifSyminch Self. PROD:#如果这个位置的单词并不是一个确切的单词, instead of a lexical structure, then recursively select the corresponding qualified wordSentence+= Self. Gen_random (SYM)Else: sentence+=Sym+ ' ' #如果已经是一个确切的单词, then connect directly to the sentence. returnSentencecfg1=CFG () Cfg1.add_prod (' S ',' NP VP ') Cfg1.add_prod (' NP ',' Det N | Det N ') Cfg1.add_prod (' NP ',' I | he | she | Joe ') Cfg1.add_prod (' VP ',' V NP | VP ') Cfg1.add_prod (' Det ',' A | | my |) Cfg1.add_prod (' N ',' Elephant | cat | jeans | suit ') Cfg1.add_prod (' V ',' kicked | followed | shot ') forIinch Range(Ten):Print(Cfg1.gen_random (' S '))
Here is a basic Python-based implementation that can be filled by recursion.
Context-independent grammars cause an issue that cannot be terminated
The algorithm above is simple enough to look great. But there is actually a problem that can easily lead to a problem that cannot be terminated.
Properties | expressions
--|--
expr| term + EXPR
expr| term-expr
expr| Term
term| FACTOR * Term
term| Factor/term
term| FACTOR
Factor|id//NUM//(EXPR)
Id|x//y//z//W
Num|0//1//2//3//4//5//6//7//8//9
For example, above is an example of generating an arithmetic expression, the above rules are in accordance with normal mathematical knowledge, but in the process of generating an expression has a problem that cannot be terminated. Expr->term + expr->term + EXPR, like this infinite loop.
Resolve an issue that cannot be terminated
To solve the problem that cannot be terminated, probability generation algorithm can be used.
This refers to the author's original image, since Term-expr's ancestors have used this expression, then the generation of the expression of the probability will be correspondingly reduced, the example of the reduction factor is 0.5, that is used once, then the next use of the expression of the probability of the original 50%.
The above algorithm uses code to implement the following
ImportRandom fromCollectionsImportDefaultdict# probability selection algorithmdefWeighted_choice (weights): rnd=Random.random ()* sum(weights) forI, Winch Enumerate(weights): rnd-=WifRnd< 0:returnIclassCFG (Object):def __init__( Self): Self. prod=Defaultdict (List)# Default Dict value is list, for null key value pairs defAdd_prod ( Self, LHS, RHS):"" " Add production to the grammar. RHS ' CanBe several productions separated by ' | '.Each production is a sequence of symbolsseparated by whitespace.Usage:Grammar.add_prod (' NT ', ' VP PP ')Grammar.add_prod (' Digit ', ' 1|2|3|4 ') """Prods=Rhs.split ('|')# by | split forProdinchProds: Self. Prod[lhs].append (tuple(Prod.split ()))# The default split is split by a space, but the segmentation here is to create a tuple that is added to the Prod defGen_random_convergent ( Self, Symbol, Cfactor=0.25, Pcount=Defaultdict (int) ):"" " Generate a random sentence from theGrammar, starting with the given symbol.Uses a convergent algorithm-productionsThat has already appeared in thederivation on each branch has a smallerchance to be selected.Cfactor-controls how tight theconvergence is. 0 < Cfactor < 1.0Pcount is used internally by therecursive calls to pass on theproductions that has been used in theBranch. """Sentence= '' # The possible productions of this symbol is weighted # By their appearance in the branch that have led to this # symbol in the derivation #Weights=[] forProdinch Self. Prod[symbol]:# Calculate the corresponding build probability for all expressions that meet a requirement ifProdinchPcount:weights.append (Cfactor**(Pcount[prod]))# for expressions that the parent node has already referred to, you need to reduce the generation probability based on the factor Else: Weights.append (1.0)#Rand_prod= Self. Prod[symbol][weighted_choice (weights)]# Select a newly generated expression based on probability # Pcount is a single object (created # This method) that's being passed around into recursive # Calls to count how many times productions has been # used. # before recursive calls the count is updated, and after # The sentence is Rolled-back # To avoid modifying the parent ' s pcount. #Pcount[rand_prod]+= 1 forSyminchRand_prod:# for Non-terminals, Recurse ifSyminch Self. PROD:# If it's not an exact word, then the recursive fill expressionSentence+= Self. Gen_random_convergent (Sym, Cfactor=Cfactor, Pcount=Pcount)Else: sentence+=Sym+ ' ' # If it's an exact word, just add it to the back of the sentence # Backtracking:clear The modification to PcountPcount[rand_prod]-= 1 # because Pcount is a reference value, it needs to revert to its original state returnSentencecfg1=CFG () Cfg1.add_prod (' S ',' NP VP ') Cfg1.add_prod (' NP ',' Det N | Det N ') Cfg1.add_prod (' NP ',' I | he | she | Joe ') Cfg1.add_prod (' VP ',' V NP | VP ') Cfg1.add_prod (' Det ',' A | | my |) Cfg1.add_prod (' N ',' Elephant | cat | jeans | suit ') Cfg1.add_prod (' V ',' kicked | followed | shot ') forIinch Range(Ten):Print(Cfg1.gen_random_convergent (' S ')) Cfg2=CFG () Cfg2.add_prod (' EXPR ',' term + EXPR ') Cfg2.add_prod (' EXPR ',' term-expr ') Cfg2.add_prod (' EXPR ',' term ') Cfg2.add_prod (' term ',' FACTOR * term ') Cfg2.add_prod (' term ',' Factor/term ') Cfg2.add_prod (' term ',' FACTOR ') Cfg2.add_prod (' FACTOR ',' ID | NUM | (EXPR) ') Cfg2.add_prod (' ID ',' x | y | z | w ') Cfg2.add_prod (' NUM ',' 0|1|2|3|4|5|6|7|8| 9 ') forIinch Range(Ten):Print(Cfg2.gen_random_convergent (' EXPR '))
Summary
Recursion makes it easy to implement algorithms for generating sentences based on context-independent grammars. However, it is important to note that the common algorithm can cause the problem that can not be terminated, in view of this problem, some people put forward the algorithm of sentence generation based on probability, which solves the problem that cannot be terminated very well.
Sentence generation algorithm based on context-independent grammar