String Matching-AC automatic machine

Source: Internet
Author: User
/*************************************** * **************** Algorithm introduction: the AC automatic machine (Aho_Corasick automation) is a multi-mode string matching algorithm. A common example is to give n words, and then give an article containing m characters, find out how many words have appeared in the article. algorithm steps: (1) construct a Trie tree. The root node does not contain characters, and each node except the root node contains only one character; from the root node to a node, the character passing through the path is connected to the corresponding string of the node; all the sub-nodes of each node contain different characters; (2) construct the failure pointer; set the letter on this node to C; follow his father's failure pointer until he reaches a node, and his son also has a node with the letter C; next, point the failure pointer of the current node to the son of C. If the root node fails to be found, point the failure pointer to the root node. The specific operation is as follows: add root to the queue first (the failure pointer of root points to itself or NULL ). After processing a vertex, all its sons are added to the queue. (3) pattern matching; ① matching the current character indicates that a path along the tree edge can reach the target character from the current node; in this case, you only need to go to the next node along the path to continue matching; the target string pointer moves to the next character to continue matching; ② the current character does not match, the matching process ends with the pointer pointing to the root node. repeat any of the two processes until the pattern string ends; if the current character does not match, it means that no side of the current node can match the character to be matched. In this case, it cannot move forward along the existing path. It can only be traced back to the longest suffix string; if no suffix string matches, it is traced back to the root of the tree. Then, it is determined from the current backend node whether it can reach the characters in the target string. Because all strings in the Trie tree are known, you can store the paths that fail to match in the Trie tree structure. Therefore, as long as the Trie tree is constructed, you can match the paths of the Trie tree with high efficiency. algorithm supplement: KMP uses two pointers, I and j, respectively. A [I-j + 1 .. i] and B [1 .. j] completely equal. That is to say, I is constantly increasing. As I increases, j changes accordingly, and A string whose length ends with A [I] j exactly matches the first j characters of string B. When A [I + 1] is less than B [j + 1], the KMP policy is to adjust the j position (reduce the j value) so that A [I-j + 1 .. i] and B [1 .. j] The new B [j + 1] exactly matches A [I + 1], and the next function records the position where j should be adjusted; the failure pointer of the same AC automatic machine has the same function; that is, when the mode string is matched on the Tire, if it cannot continue to match with the keywords of the current node; the node pointing to the failed pointer of the current node should be matched; **************************************** * ***************/# include <iostream> # include <cstdio> # include <cstring> using namespace std; cons T int K = 26; const int C = 55; const int N = 500010; const int M = 1000010; struct node {node * fail; // failure pointer node * next [K]; // Tire each node's 26 subnodes (up to 26 letters) int count; // whether it is the last node of the word node () // constructor initialization {fail = NULL; count = 0; memset (next, NULL, sizeof (next) ;}} * q [N]; // queue for bfs Construction Failure pointer char keyword [C]; // input the word char str [M]; // mode string int head, tail; // the header and tail pointer of the queue void insert (char * str, node * root) // create Trie {node * p = root; int len = st Rlen (str); for (int I = 0; I <len; ++ I) {int temp = str [I]-'A '; if (p-> next [temp] = NULL) p-> next [temp] = new node (); p = p-> next [temp];} p-> count ++;} void build_ac_automation (node * root) // initialize the fail pointer, BFS {root-> fail = NULL; q [head ++] = root; // The while (head! = Tail) {node * temp = q [tail ++]; node * p = NULL; for (int I = 0; I <K; I ++) {if (temp-> next [I]) {if (temp = root) // The first element fail must point to the root temp-> next [I]-> fail = root; else {p = temp-> fail; // The failure pointer while (p) // ends in two cases: if the match is null or finds the match {if (p-> next [I]) // find the matched {temp-> next [I]-> fail = p-> next [I]; break;} p = p-> fail ;} if (p = NULL) // if it is NULL, match temp-> next [I]-> fail = root ;} q [head ++] = temp-> next [I]; // enter the queue }}} int query (node * root) // scan {node * p = ro Ot; // The Tire entry int len = strlen (str); int cnt = 0; for (int I = 0; I <len; I ++) {int index = str [I]-'A'; while (p-> next [index] = NULL & p! = Root) // jump failure pointer p = p-> fail; p = p-> next [index]; p = (p = NULL )? Root: p; node * temp = p; // p does not move. temp calculates the suffix string while (temp! = Root & temp-> count! =-1) {cnt + = temp-> count; temp-> count =-1; temp = temp-> fail;} I ++;} return cnt ;} int main () {// freopen ("C: \ Users \ Administrator \ Desktop \ kd.txt", "r", stdin); int t; scanf ("% d", & t); while (t --) {head = tail = 0; node * root = new node (); int n; scanf ("% d", & n); getchar (); while (n --) {gets (keyword); insert (keyword, root);} build_ac_automation (root ); scanf ("% s", str); printf ("% d \ n", query (root);} return 0 ;}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.