Next array algorithm in string pattern matching KMP algorithm and implementation of C + +

Source: Internet
Author: User

first, the problem description:

For two strings s, t, find the starting position where T first appears in s, or 1 if T does not appear in S.

second, the input description:

Two strings of s, T.

three, the output description:

The starting position of the first occurrence of the string T in s, or 1 if it does not appear.

Iv. input Examples:

Ababaababcb
Ababc

V. Output examples:

5

six, KMP Algorithm Analysis:

The KMP algorithm is divided into two steps, the first step is to calculate the next array, and the second step is to compare two strings in a more economical way by backtracking from the next array.

There is a slight difference in the meaning of the corners of the next array in different articles on the web, and the next definition of the data structure (c + + Version) in the reference wang.

The long string is s, the short string is the length of the T,next array consistent with the length of the short string T, and next[j] represents the maximum K value that makes t[0]~t[k-1]=t[j-k]~t[j-1].

When t= "ababc", next=[-1,0,0,1,2].

In Layman's terms, next[j] represents the K-letters from 0 onwards and the k-letters from the j-1, which are arranged in a angular order, exactly the same maximum K value, which reduces the distance of backtracking, thus reducing the number of comparisons.

According to the data structure (c + +), The pseudo-code of the KMP algorithm can be expressed in the following pseudo-code:

1 the starting subscript I and J of the comparison are set in string s and string T respectively; 2 repeat the following until all the characters of S or T are compared;     2.1 if s[i] equals t[j], continue to compare the next pair of characters S and    t; 2.2 otherwise, the subscript j is traced back to the next[j] position, i.e. j = Next[j    ]; 2.3 If J equals-1, the subscript I and J are added 1 respectively to prepare for the next comparison; 3. If all the characters in T are compared, the matching i-j is returned;    Otherwise returns -1;

The C + + code for the KMP algorithm is as Follows:

1 intKMP (stringSstringT)2 {3vector<int> next =GetNext (T);4     inti =0, j =0;5      while(s[i]! =' /'&& t[j]! =' /')6     {7         if(s[i] = =T[j])8         {9++i;Ten++j; one         } a         Else -         { -j =next[j]; the         } -         if(j = =-1) -         { -++i; +++j; -         } +     } a     if(t[j] = =' /') at         returnIj; -     Else -         return-1; -}

Only the definition of next array is given in the book, and the algorithm is left to the reader, and we will do it here.

According to the definition of next array in the book, when t= "ababc",

j=0, next[0] =-1;

j=1, next[1] = 0;

The next next array is to be evaluated,

j=2, t[0]≠t[1], Then next[2] = 0;

j=3, because previously compared to t[0] and t[1] do not want to wait, so there is no need to compare t[0~1] and t[1~2] (certainly do not want to wait), direct comparison t[0]=t[2], then next[3] = 1;

j=4, because next[3] = 1 know t[0]=t[2], so you can directly compare t[1]=t[3], can get t[0~1] and t[2~3], then next[4] = 2;

There is also a situation where the time to calculate next is saved, and a longer string is used to illustrate this situation, when t= "ababaababcb"

j=4, we calculated next[4] = 2 (ab=ab);

j=9, we calculated next[9] = 4 (abab=abab);

j=10, next[9] = 4 t[0~3]=t[5~8], Direct comparison t[9]= ' C ' and t[4]= ' a ' are not equal, next[4] is 2 available t[0-1] and t[2-3] Repeat for known, from already judged to t[9] and t[4] can get t[7-8] And t[2-3] are known, so can be inferred t[0-1] and t[7-8] is a repeating character, can be directly judged t[9] and t[next[4]], that is t[9] and t[2], thereby omitting the repeated judgment t[0-1] and t[7-8].

The C + + code that calculates the next data is as Follows:

1vector<int> GetNext (stringT)2 {3vector<int> Next (t.size (),0);//next matrix, meaning reference Wang version of the data structure p84. 4next[0] = -1;//the No. 0 bit of the next matrix Is-15     intK =0;//k value6      for(intj =2; J < T.size (); ++j)//calculates the next value of each character starting with the 2nd character of the string T7     {8          while(k >0&& t[j-1] !=t[k])9K =next[k];Ten         if(t[j-1] ==t[k]) onek++; anext[j] =k; -     } -     returnNext//return to next matrix the}

Among them, 8th, 9 acts the above t= "ababaababcb", the situation that appears when j=10. 10th, 11 acts similar to t[0]=t[2] in a relatively successful Situation. The 12th behavior is similar to next[3] = 1 Assignment.

Vii. Complete Procedures

1#include <iostream>2#include <vector>3#include <string>4 5 using namespacestd;6 7vector<int> GetNext (stringT)8 {9vector<int> Next (t.size (),0);//next matrix, meaning reference Wang version of the data structure p84. Tennext[0] = -1;//the No. 0 bit of the next matrix Is-1 one     intK =0;//k value a      for(intj =2; J < T.size (); ++j)//calculates the next value of each character starting with the 2nd character of the string T -     { -          while(k >0&& t[j-1] !=t[k]) theK =next[k]; -         if(t[j-1] ==t[k]) -k++; -next[j] =k; +     } -     returnNext//return to next matrix + } a  at intKMP (stringSstringT) - { -vector<int> next =GetNext (T); -     inti =0, j =0; -      while(s[i]! =' /'&& t[j]! =' /') -     { in         if(s[i] = =T[j]) -         { to++i; +++j; -         } the         Else *         { $j =next[j];Panax Notoginseng         } -         if(j = =-1) the         { +++i; a++j; the         } +     } -     if(t[j] = =' /') $         returnIj; $     Else -         return-1; - } the  - intMain ()Wuyi { the     stringS ="ABABAABABCB"; -     stringT ="ABABC"; wu     intnum =KMP (S, T); -cout <<num; about     return 0; $}

Reference Documents:

[1] wang, Hu ming, Wang Tao. Data structure (c + + Version) [M]. Beijing: Tsinghua University press, 2011:83-85.

[2] the Ox-guest net. String pattern matching [db/ol]. Https://www.nowcoder.com/practice/084b6cb2ca934d7daad55355b4445f8a?tpId=49&&tqId=29363&rp=1&ru =/activity/oj&qru=/ta/2016test/question-ranking

Next array algorithm in string pattern matching KMP algorithm and implementation of C + +

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.