AC Automatic Machine

Source: Internet
Author: User
Tags dashed line

Learning AC automata requires first KMP and TRIE:KMP | Trie

Finally get started, look at the day, now tidy up a bit.

Introduction to Algorithms:

AC automata is mainly used to solve the problem of multi-mode string matching.

such as hdu2222:

Given N (n <= 10000) length of a pattern string of less than 50, and given a length of l (L <= 106) Target string, how many pattern strings appear in the target string.

Mode string: She he say shr her target string: A total of 3 pattern strings appear in the YASHERHS target string, namely she he she

Algorithm steps:

1. Build a dictionary tree with the pattern string first

2. Construct a mismatch pointer on a node in the dictionary sequence to form a trie diagram

3. Query with the target string

Building a dictionary Tree

This is not much to say, see the dictionary tree

voidCreattrie (Trie *root,Char*str) {Trie*p=Root; P=Root;  for(intI=0; str[i];i++)    {        intId=Chartonum (Str[i]); if(id==-1) {puts ("Error"); return; }        if(p->next[id]==NULL) P->next[id]=NewTrie (); P=p->Next[id]; P->v++; } P->cnt++;}
Mismatch pointers

Meaning of a mismatch pointer

When a character match of a pattern string fails, it jumps to its failed pointer to continue the match, repeats the above operation until the character matches successfully, so the failed pointer must satisfy a property, it must point to a string prefix, and this prefix is the prefix of the current node suffix, and must be the longest suffix . A closer look at this sentence, first of all, must be a string prefix, which is obvious, because the trie tree is a prefix tree, its any node is a pattern string prefix, and then look at the following sentence, in order to let the current character can find a match, Then a suffix of the current node must match the prefix of a pattern string, which coincides with the KMP's next array.

How to use BFS to find the failure pointers for all nodes

1) for the root node of the failure pointer, we will direct it to null, the root node under all the sub-nodes, the failure pointer must be point to root, because when a character can not match, there is no longer a shorter prefix can be matched with it;

2) The node of the failed pointer is inserted into the queue;

3) Each time a node pops up, ask for the corresponding sub-node of each character, and for the sake of convenience, we will now record the I-cell node as Now->next[i]:

A) if now->next[i] is null, then now->next[i] points to the I-cell node of now's failed pointer, i.e. now->next[i] = now->fail->next[i];

b) If now->next[i] is not equal to NULL, you need to construct Now->next[i] 's failed pointer, because of the operation of a), we know that now the failure pointer must exist an I-cell node, that is, Now->fail->next[i] , then we point Now->next[i] 's failure pointer to it, i.e. Now->next[i]->fail = now->fail->next[i];

4) repeat 2) operation until the queue is empty;

voidBUILD_AC (Trie *root) {Trie*p,*tmp; intHead=0, tail=0; Q[tail++]=Root;  while(head!=tail) {P=q[head++];  for(intI=0; i<maxn;i++) {tmp= (p==root)?root:p->fail->Next[i]; if(p->next[i]==NULL) {P->next[i]=tmp; }            Else{p->next[i]->fail=tmp; Q[tail++]=p->Next[i]; }        }    }}
Target string Matching

The target string also needs to be scanned when matching the target strings. Since the trie diagram has been created, each node can enter into the next state when it is read into a single character, so we just need to traverse through the characters given by the target string, and then each time we check whether the current node is the end node, and of course, the node that the failed pointer to p points to ... Accumulate all cnt and that is the number of pattern strings.

intQuery_ac (Trie *root,Char*str) {    intCnt=0; Trie*p=root,*tmp=NULL;  for(intI=0; str[i];i++)    {        intId=Chartonum (Str[i]); if(id==-1) {p=Root; Continue; } P=p->Next[id]; TMP=p;  while(tmp!=root&&tmp->cnt!=-1)        {            if(tmp->CNT) {CNT+=tmp->CNT; TMP->cnt=-1; } tmp=tmp->fail; }    }    returnCNT;}

hdu2222

1. Building a dictionary Tree

2. Construct the failed pointer

Using BFS to construct a failed pointer, similar to the KMP algorithm.

First, the root of the queue, the 1th time the loop to deal with the character connected to the root, that is, the first character of each word h and s, because the first character mismatch needs to be re-match, so the first character will point to root (root is the trie entrance, no actual meaning) the pointer of the failure pointers to the corresponding (1 ), (2) two dashed line;

After entering the loop for the 2nd time, the H is ejected from the queue, then the node that the P points to the H node's fail pointer, that is, Root;p=p->next[i], then the fail pointer of node E to root indicates that there is no matching sequence, corresponding to Figure 2 (3), Then node e enters the queue;

In the 3rd cycle, the first Node A pops up with the same node e as the previous operation, and the fail pointer of a points to root, corresponding to Figure 2 (4), and merged into the team;

The 4th time you enter the loop, the node H pops up (the one on the left in the picture), and the operation is slightly different. Because P->next[i]!=null (Root has h this son node, the right one in the picture), so that the left of the H node's failure pointer to the right of the root son node H, corresponding to Figure 2 (5), and then H queue. And so on: At the end of the loop, all the failed pointers are in this form in Figure 2.

3. Scanning

After constructing the trie and the failed pointers, we can scan the main string. This process is similar to the KMP algorithm, but there are some differences, mainly because the AC machine is multi-string mode, need to prevent the omission of a word, so the introduction of the temp pointer.

The matching process is divided into two situations: (1) The current character matches, indicating that there is a path from the current node along the edge of the tree to reach the target character, at this point only along the path to the next node to continue matching, the target string pointer moves downward character continue to match, (2) The current character does not match, The character that the current node failed pointer points to continues to match, and the matching process ends with the pointer pointing to root. Repeat any one of these 2 processes until the pattern string goes to the end.

In contrast, look at the pattern matching this detailed process, where the pattern string is YASHERHS. For i=0,1. There is no corresponding path in the trie, so no action is done; when i=2,3,4, the pointer p goes to the lower left node E. Because the count information for node E is 1, so cnt+1, and the count value of node E is set to-1, which indicates that the changed word has already appeared, prevents the repetition of the count, and finally the node to which the failed pointer to the E node is pointing continues to find, and so on, and finally temp points to root, Exits the while loop, in which count increases by 2. The expression found 2 words she and he. When I=5, the program enters line 5th, p points to its failed pointer node, which is the E node on the right, and then to the R node on line 6th, the R node has a count value of 1, thus count+1, looping until temp points to root. At the end of the i=6,7, no match was found and the matching process ended.

Note: The process of establishing an AC automaton changes the original next[of the dictionary order, so it is no longer possible to query and free memory using the method in the dictionary order

#include <cstdio>#include<cstring>#include<string>#include<iostream>#include<algorithm>using namespacestd;#defineMAXN 26#defineN 500010//Define Queue Sizestructtrie{Trie*NEXT[MAXN]; Trie*fail; intCnt//end tag, counting the number of words ending at that node    intV//prefix tag, counting the number of words prefixed by the string in front of the nodeTrie () {fail=NULL; CNT=0; V=0;  for(intI=0; i<maxn;i++) Next[i]=NULL; }}*Q[n];intChartonum (Chars) {    if(s>='a'&&s<='Z')returnS-'a'; return-1;}voidCreattrie (Trie *root,Char*str) {Trie*p=Root; P=Root;  for(intI=0; str[i];i++)    {        intId=Chartonum (Str[i]); if(id==-1) {puts ("Error"); return; }        if(p->next[id]==NULL) P->next[id]=NewTrie (); P=p->Next[id]; P->v++; } P->cnt++;}voidBUILD_AC (Trie *root) {Trie*p,*tmp; intHead=0, tail=0; Q[tail++]=Root;  while(head!=tail) {P=q[head++];  for(intI=0; i<maxn;i++) {tmp= (p==root)?root:p->fail->Next[i]; if(p->next[i]==NULL) {P->next[i]=tmp; }            Else{p->next[i]->fail=tmp; Q[tail++]=p->Next[i]; }        }    }}intQuery_ac (Trie *root,Char*str) {    intCnt=0; Trie*p=root,*tmp=NULL;  for(intI=0; str[i];i++)    {        intId=Chartonum (Str[i]); if(id==-1) {p=Root; Continue; } P=p->Next[id]; TMP=p;  while(tmp!=root&&tmp->cnt!=-1)        {            if(tmp->CNT) {CNT+=tmp->CNT; TMP->cnt=-1; } tmp=tmp->fail; }    }    returnCNT;}Charqu[1000010];intMain () {intT,n; scanf ("%d",&T); Charstr[ -];  while(t--) {Trie*root=NewTrie (); scanf ("%d",&N);  while(n--) {scanf ("%s",&str);        Creattrie (ROOT,STR);        } build_ac (root); scanf ("%s", Qu); printf ("%d\n", Query_ac (Root,qu)); //Delete (root); AC automata can't release memory like this    }    return 0;}
View Code

Reference:

Http://www.cppblog.com/menjitianya/archive/2014/07/10/207604.html

http://blog.csdn.net/niushuai666/article/details/7002823

http://blog.csdn.net/niushuai666/article/details/7002736

AC Automatic Machine

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.