"AC Automaton" "String" "Dictionary Tree" ac Automata learning Notes

Source: Internet
Author: User

Blog:www.wjyyy.top

ac automata is a convenient multi-pattern string matching algorithm for cancer . Based on the dictionary tree, a similar kmp thinking is used.

ac automata unlike KMP, AC automata can match multiple pattern strings at the same time, and the complexity does not reach too high. If the string is matched multiple times with KMP, the complexity is \ (O (k (n+m)) \).

We know that it is most convenient to use a dictionary tree to match a string header to match the other strings exactly. However, if the matching process finds that the current node does not have a target son, a mismatch occurs. In the KMP string match, the mismatch can jump to the NXT that is preprocessed to the current position, continuing the match.

and the AC automata in the dictionary tree, how do we find out where each node mismatch? We know that, like KMP, the mismatch location is the only certainty. In the dictionary tree, a path uniquely corresponds to a substring and is therefore the only certainty.

the NXT array in KMP is the NXT in the previous position by the variable J, and we consider the transfer of the mismatch pointer from the parent node to the AC automaton. So, the mismatch pointer of the current node (set to ' C ') is recursively recursive along the mismatch pointer from the current parent node, finding the first node with the ' C ' character as the son, and connecting the mismatch pointer of the current node to the ' C ' son of the node. In doing so, you will find that the tree with the mismatch pointer becomes a graph. But what if the backtracking process above finds a root and has not been found?

Under normal circumstances, the root of the dictionary tree is not with any characters, that is, it is an empty node and is also the beginning of a re-match . If we make a mistake in the first match, that is, the root node does not have the son, of course we have to stay in our own position to continue to do, so the root node of the mismatch pointer to themselves (while preventing out of bounds). Similarly, the sons of the root node mismatch pointers point to the root node, because in this mismatch, then there are only two cases: first, the root node is not the son, and then return to the root node of the general situation; second, the root node has this son, the root node has this son we will go through the current node mismatch pointer to the root node, Go to this son again.

So, our trie tree became the trie map.

The AC automata can be understood more clearly based on the first two images above, where the red side is the fail edge. We can find an interesting thing, the fail pointer can form a tree with a direction, notice that each individual chain has no branches, and that the letters on a chain are always the same, and therefore may appear in later topics or optimizations. (Just like KMP's NXT)

In fact, when we build AC automata, our mismatch pointers don't build this way. In order to reduce the constant (perhaps for this reason), we think that if the current node does not have a son ' C ', the current node represents the ' C ' son's pointer to the current node mismatch pointer to the ' C ' son. Because a point mismatch pointer to the node is always lighter than this point, so we use BFS to do, the depth of the shallow point is always more than the depth of the point is accessed first, and therefore, the current node mismatch pointer to the ' C ' son must have a position, even if it is not the real son, it must be through the mismatch pointer index. In the worst case, if no such son can be found in the course of the mismatch pointer backtracking, the ' C ' son of the current node is naturally connected to the root.

Similar to the dictionary tree, the successful match of AC automata is to find the end of a word, and we should mark the end of each pattern string when we build the dictionary tree. But what if two pattern strings have containment relationships? There are two ways to do this, one is to access each node when the violent jump fail pointer, until recursion to the root, the contribution to the answer is the number of marks of this path, and the second is to build a fail tree, jump is along the fail tree in the jump, only need to preprocess each node on the fail tree on the root path of the number of tokens You can record the answer at the current node. It seems that the second method is more complex, but it has limitations. That is, when the number of occurrences of each pattern string is accurately counted, the number of occurrences and the method of using the fail tree is not applicable.

Code of luoguP3796:

This topic should pay attention to repeating the pattern string statistic problem

#include <cstdio> #include <cstring> #include <vector>using std::vector;vector<int> same[155]    ;//with a pattern string the same pattern string number struct node{int end,num;//num indicates the number of identical pattern strings, and end indicates whether the end position is node *ch[26];    Node *fail;        Node () {memset (ch,0,sizeof (ch));        Fail=null;        End=0;    num=0;            } void Build (char *c,int i)//build dictionary tree {if (*c== ') {end=1;            if (!num) num=i;        Same[num].push_back (i);//If it is found that there is already a word end, then must be repeated, directly to the original after adding the number is good return;        } if (!ch[*c-' a ']) ch[*c-' A ']=new node ();    ch[*c-' A ']->build (c+1,i); }}*root=new node (); Char T[200][200];node *q[1000011];//with queue completion bfsint l=0,r=0;void fail ()//build Fail pointer {root->fail=root ;//No this sentence seems to be possible, for the sake of insurance, to prevent cross-border for (int i=0;i<26;++i)//root node of the son mismatch pointer all point to themselves if (!root->ch[i])//No this son points to the mismatch pointer of this son, and mismatch refers to        The needle was himself, in order not to be disordered and convenient, the son pointed himself root->ch[i]=root; else {Root->ch[i]->fail=root;//setting a mismatch pointer q[++r]=root->ch[i];        } while (L<r) {node *p=q[++l]; for (int i=0;i<26;++i) if (P->ch[i]) {p->ch[i]->fail=p->fail->ch[i];            There's this son. Sets the mismatch pointer to his own mismatch pointer, where his own mismatch pointer points must have finished working q[++r]=p->ch[i];    } else p->ch[i]=p->fail->ch[i]; } return;    Char s[1000010];int cnt[155];void match () {int ans=0;    scanf ("%s", s);    Node *now=root;        for (int i=0;s[i]!= ' n '; ++i)//start matching {now=now->ch[s[i]-' a '];        cnt[now->num]+=now->end;        Node *p=now;            while (p!=root)//violent jump fail {p=p->fail;        cnt[p->num]+=p->end;    }}}int Main () {int n;    scanf ("%d", &n);        while (n) {root=new node ();        memset (cnt,0,sizeof (CNT));            for (int i=1;i<=n;++i) {scanf ("%s", T[i]);        Root->build (T[i],i);     } Fail ();   Match ();        int mx=0; for (int. i=1;i<=n;++i) {for (Vector<int>::iterator it=same[i].begin (); It!=same[i].end (); ++it)//place            The same pattern string cnt[*it]=cnt[i];                if (Mx<cnt[i]) {cnt[0]=1;            Mx=cnt[i];        } else if (Mx==cnt[i]) ++cnt[0];        } printf ("%d\n", MX);        for (int i=1;i<=n;++i) if (cnt[i]==mx) printf ("%s\n", T[i]);    scanf ("%d", &n); } return 0;}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.