KMP algorithm matching principle and C + + implementation

Source: Internet
Author: User

original works, reproduced please indicate the source: point I
Suppose A represents a target string, a= "ABABABAABABACB", B means a matching pattern, b= "ABABACB" is represented by two pointers I and J, a[i-j+1 .... i] is exactly equal to B[1...J]. That is, I is increasing, with the increase of I j corresponding changes, and the length of the string that satisfies the end of A[i] is exactly the first J character of the B string (J of course, the larger the better), and now requires the relationship of jianyana[i+1] and b[j+1]. When A[i+1]=b[j+1], I and J each increment one, when j=m, we say B is a substring (b string is complete), and according to this makes I value to calculate the position of the match. When the A[I+1]<>B[J+1],KMP strategy is to adjust the position of J (Decrease the J value) to make A[I-J+1...I] match B[1...J] and the new b[j+1] match a[i+1] exactly. i = 1 2 3 4 567 8 9 14A = a B a b abA A b a b a c BB = a B a b aCBJ = 1 2 3 4 56   7 when i,j equals 5 o'clock, a[i+1] is not equal to b[j+1], this is to reduce J to J ' (that is, to move the B string to the right). We found that J ' must make the head J ' letters in B[1...J] exactly equal to the last J ' letters, so that J becomes J ' before the properties of I and J can continue to be maintained. Of course, J ' the bigger the better. Beware of J for 3 o'clock, exactly as required. As we can see, the new J can take no relation to I, only the B string. We can preprocess such an array p[j], indicating the maximum value of the J of the heart when the j+1 letter does not match the first J-Letter of the B array. Take b= "ABABACB" as an example, explain the results of P[j] array a b A B a C B0 0 1 2 3 0 01, the first character a, set to 0, that is, p[1]=02, "AB" The first character is a, the last character is B, is not equal, so the length is 0, that is, p[2]=03, "ABA" , the first two strings are "AB", the latter two are "BA", not the same, the first string is a, the latter is a, the same, so the length is 1, that is, p[3]=14, "Abab", the first two for AB, the latter two for AB, the same, the first three bit ABA, the end three for the Bab, So the maximum length is 2, namely p[4]=25, "Ababa", the first three bit ABA, the end three are also ABA, the first four is Abab, the end four is Baba, different, so the maximum length is 3, that is p[5]=3 and so on, you can get the array p[j]   find out p[j] After that, you can match according to P[j], or the above a, B for example, the matching process used in a number of variable pattern to represent B, Target indicates that Aheadindex points to the first character of a substring that matches B in a, Targetindex points to the index of the character that is matching B in a. Patternindex the index of the character in B that is being matched in B targetindex equals the number of digits to the right and Patternindex, that is, targetindex=headindex-1+patternindex the first step, this time, patternindex= 1,targetindex= 1,headindex=1 at this time pattern[patternindex] = = Target[targetindex], Then Patternindex and Targetindex add one, and then compare whether the same until Targetindex and Patternindex 6, Pattern[patternindex]! = Target[targetinDex] At this point, you need to move B to the right, for the next match, how much better to move it? This needs to be calculated according to P[j] because at this time,The Ababa in front of Patternindex has been matched, p[5]=3, the length of the preceding string Ababa is 5, so the string pattern moves to the right by 5-3=2, that is, the pattern moves right to 3, that is, the new headindex= Headindex+2=3, and the new patternindex=p[5]+1=4, the new Patternindex, points to the fourth place in string B, targetindex=headindex+patternindex-1=3+4-1=6, So when the situation after the move, at this time pattern[patternindex] = = Target[targetindex], and then Patternindex and Targetindex add one, and then compare whether the same when Patternindex equals 6,targetindex equals 8 o'clock, Pattern[patternindex]! = Target[targetindex], and move the B string to the right, at this time, P[5]=3, the front of the Ababa has been matched, the length is 5, so the right to move the number of bits for 5-3=2, at this time, headindex=headindex+2=3+2=5,patternindex=p[5]+1=4, pointing to the fourth bit in string B, Targetindex=headindex+patternindex-1=5+4-1=8, so targetindex points to the eighth bit in a string.at this time Pattern[patternindex]! = Target[targetindex], but also to move the B string to the right, at this time the previous ABA has been matched successfully, the length of 3,p[3]=1, so to the right to move the length of 3-1=2, moving two bits, at this time, headindex=headindex+2=5+2=7, Patternindex=p[3]+1=1+1=2 points to the second digit in string B, targetindex=headindex+patternindex-1=7+2-1=8, and points to the eighth bit of a string.at this time Pattern[patternindex]! = Target[targetindex], but also to move the B string to the right, when the front has matched the string of a, the length of 1,p[1]=0, to the right to move the number of bits for 1-p[1]=1-0=1; at this point, headindex=headindex+1=7+1=8,patternindex=p[1]+1=1, pointing to the first bit of string B, targetindex=headindex+patternindex-1=8+1-1= 8, pointing to the eighth bit of a string,Once again, the match will be successful. Here is the C + + implementation of the KMP algorithm, a little bit of a problem
1 #ifndef __kmp__h__2 #define__kmp__h__3#include <string>4#include <vector>5 using namespacestd;6 7 classkmp{8  Public:9              //void static getNext (const string &str,vector<int> &vec);Ten              intKMP (); One KMP () {} AKMP (Const string&target,Const string&pattern): Mtarget (target), Mpattern (pattern) {} -             voidSettarget (Const string&target); -             voidSetpattern (Const string&pattern); the Private: -vector<int>MVec; -             stringMtarget; -             stringMpattern; +             voidGetNext (); - }; + #endif

Here is the source code implementation

1#include"KMP.h"2#include <iostream>3#include <vector>4 using namespacestd;5 6 7 //gets the length of the same subset in all substrings of the string str8 //For example, String ABABACB, get the string a,ab,aba,abab,ababa,ababac,ababacb, respectively D9 //maximum length of the same substring as the front and back, such asTen //A: Because AA is a single character, the maximum length of the first and last identical substrings is A0 One //ABA, the first A and the last meta A is the same, so the value is a1,abab first 2 AB and the last two ab the same, the value is A2 A //Ababa first 3 is Aaba, the last 3 is Aaba, so the value is A3 - voidKmp::getnext () - { theMvec.clear ();//clear? EC -       //vec.push_back (0);//for a It is easy to use, the first data of VEC is not used -Mvec.push_back (0);//The next position of the first character must be 0, such as "ABABACB", and the value of the first character a is 0 -       string:: Const_iterator start =Mpattern.begin (); +       string:: Const_iterator pos = start +1; -        while(POS! =mpattern.end ()) +       { A             stringSUBSTR (start,pos+1);//Get substring at             intStrLen = Substr.size ()-1;//gets the maximum length of the same substring in a substring -              Do -             { -                    stringPrefix (SUBSTR,0, StrLen);//gets the front strlen subset of D in Substr -                    stringPostfix (Substr,substr.size ()-strlen,strlen);//get the front of D in substr? Trlen subset -                    if(prefix = =postfix) in                    { - Mvec.push_back (strLen); to                             Break; +                     } ---StrLen; the                     /if the length of the same subset is less than one *                     /description does not have the same, then put 0 pressure stack $                     if(StrLen <1)Panax NotoginsengMvec.push_back (0); -} while(StrLen >0); the  +++Pos; A        } the } +  - voidKmp::setpattern (Const string&pattern) $ { $Mpattern =pattern; - } -  the voidKmp::settarget (Const string&target) - {WuyiMtarget =Target; the } -  Wu  -  About  $ intKMP::KMP () - { -GetNext ();//first Get Next data -      intTargetindex =0; A      intPatternindex =0; +      intHeadindex =0;//index of the first element that points to the target that matches the pattern the       while(Patternindex! = mpattern.size () && targetindex! =mtarget.size ()) -      { $             for(inti =0; I < mpattern.size ()-1;++i) the            { the                   if(Mpattern[patternindex] = =Mtarget[targetindex]) the                   { the++Patternindex; -++Targetindex; in                          if(mpattern.size () = = Patternindex)//if the match has been successful, exit the loop the                                   Break; the                    } About                    Else the                    { the                          if(0= = Patternindex)//if the first character does not match, move the mtarget to the left one the++Headindex; +                          Else -                          { theHeadindex + = patternindex-mvec[patternindex-1];//since the vector index is zero-based, subtract aBayiPatternindex = mvec[patternindex-1];//Update Patternindex Index the                          } theTargetindex = Headindex + patternindex;//index with new Targetindex -                           Break; -                     } the  the             } the       } the  -       returnHeadindex; the}

KMP algorithm matching principle and C + + implementation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.