Reprinted from: http://hi.baidu.com/nialv7/item/ce1ce015d44a6ba7feded52d
AC automata is used to deal with multiple string matching problems, that is to give you a lot of strings, and then give you an article, let you find in the article whether these strings appear, where appear. Perhaps you have considered the meaning of the AC automata name, and I have had the same idea. You now know KMP, he is called KMP, because the algorithm is Knuth, Morris, Pratt Three, took the first letter of the three people's name. Then the AC automaton is the same, he is aho-corasick. So don't be yy to think that AC automata is AC (cept) self-motive, although he can really help you with a point of AC.
。。。 Pull away ...
To learn AC automata, we must know what is trie, the letter tree. If you will, please skip this section.
Trie is made up of letters.
Look at the picture first :
This is a trie tree. The dots that are marked with green indicate the end of a word (why do you mean it?). Look down to know). A "word" is formed from a letter on the tree path from root to the green node.
/* Maybe you read this section and you know how to build the trie, so skip the following paragraphs, please. */
So how to build a trie? Let me start with an empty tree and build him one step at a time.
At first, we had a root:
Now, insert the first word, she. This is equivalent to inserting a chain in the tree. The process is simple. After inserting, we add a green marker to the last letter ' E ', the result
One more word, shr (what word?) ..... right displacement AH). Since Root already has a ' s ', we do not repeat the insertion, similarly, because ' s ' under the ' H ', we also skip him, directly under the ' H ', insert ' r ', and the ' R ' is marked green. Results
In the same way, we continue to insert the remaining elements into the tree.
Final Result:
This is the case:
Well, now that we've got a trie, but that's not enough, we're going to introduce a very powerful thing on the trie: the failed pointer or the shift array or the next function ... what do you call it, anyway it's the essence of KMP, and that's why I called you to see KMP.
KMP We use two pointers I and J respectively, a[i-j+ 1..I] and B[1..J] are exactly equal. That is, I is constantly increasing, with the increase of I j correspondingly changes, and J satisfies the length of J with A[i] to match the first J character of B string exactly, when the A[I+1]<>B[J+1],KMP strategy is to adjust the position of J (Decrease J value) makes a[i-j+ 1..I] is matched with B[1..J] and the new b[j+1] matches [a[i+1] exactly (so that I and J can continue to increase).
The failed pointer on the trie tree is similar to this.
Suppose there is a node K, and his failure pointer points to J. Then k,j satisfies this property: the distance from Root to J is n, and the N-length word from the nth node above K to K is the same as the word from ROOT to J.
The failed pointer to ' E ' in the middle she should point to her ' e '. Because:
The red-framed part of the figure is exactly the same.
So how do we build this thing? In fact, we can use a simple BFS to fix all this.
For each node, we can do this: set the letter C on this node, walk along his father's failed pointer, and go to a node where his son has a node with the letter C. The current node's failure pointer is then directed to the son whose word is also C. If you have not found the root, then point the failed pointer to root
At first, we added root to the queue (the root failed pointer clearly pointed to itself), and after that we took all of its sons into the queue after each point, until we had finished.
As for why this is decided, we will know when we talk about it.
OK, now we have a trie with a failed pointer, and my article is broken thousands of words, next, we will talk about how AC automata work.
AC automata is a multi-string match, that is, there will be a lot of strings for you to find, we first put these strings into a trie, and then the failure pointer, then we can start the AC automaton.
At first, Trie has a pointer t1 to root, and a pointer to the string (i.e. "article") t2 to the string header.
The next operation is similar to KMP: if T2 points to a letter that is Trie tree, T1 points to the son of the node, then t2+1,t1 to the number of the son, otherwise T1 shun this current node's failure pointer up, until T2 is a son of T1, or T1 points to the root. If T1 passed a green point, the word ending at that point would appear. Or if the point where the T1 is located can follow the failed pointer to a green point, then the word ending with that green Dot even appears.
Now we're going to come back and talk about the failed pointer. The process of actually looking for a failed pointer is a self-matching process.
, we now assume that we have identified a failure pointer for all points with a depth of less than 2 (root depth of 1), and now you want to determine E. That's the equivalent of having a trie:
And the article is ' she ', to find out where ' e ' appears. We then match ' say ', and the ' Y ' failure pointer is determined.
Well, think about it. The BFS mentioned above is actually a self-matching process, which is similar to KMP.
OK, just write this, have not understand can leave a message or email me ([email protected]), or on the push fo Me (@sdraven) ....
[Turn] AC automata detailed