original works, reproduced please indicate the source: point I
Suppose A represents a target string, a= "ABABABAABABACB", B means a matching pattern, b= "ABABACB" is represented by two pointers I and J, a[i-j+1 .... i] is exactly equal to B[1...J]. That is, I is increasing, with the increase of I j corresponding changes, and the length of the string that satisfies the end of A[i] is exactly the first J character of the B string (J of course, the larger the better), and now requires the relationship of jianyana[i+1] and b[j+1]. When A[i+1]=b[j+1], I and J each increment one, when j=m, we say B is a substring (b string is complete), and according to this makes I value to calculate the position of the match. When the A[I+1]<>B[J+1],KMP strategy is to adjust the position of J (Decrease the J value) to make A[I-J+1...I] match B[1...J] and the new b[j+1] match a[i+1] exactly. i = 1 2 3 4 567 8 9 14A = a B a b abA A b a b a c BB = a B a b aCBJ = 1 2 3 4 56 7 when i,j equals 5 o'clock, a[i+1] is not equal to b[j+1], this is to reduce J to J ' (that is, to move the B string to the right). We found that J ' must make the head J ' letters in B[1...J] exactly equal to the last J ' letters, so that J becomes J ' before the properties of I and J can continue to be maintained. Of course, J ' the bigger the better. Beware of J for 3 o'clock, exactly as required. As we can see, the new J can take no relation to I, only the B string. We can preprocess such an array p[j], indicating the maximum value of the J of the heart when the j+1 letter does not match the first J-Letter of the B array. Take b= "ABABACB" as an example, explain the results of P[j] array a b A B a C B0 0 1 2 3 0 01, the first character a, set to 0, that is, p[1]=02, "AB" The first character is a, the last character is B, is not equal, so the length is 0, that is, p[2]=03, "ABA" , the first two strings are "AB", the latter two are "BA", not the same, the first string is a, the latter is a, the same, so the length is 1, that is, p[3]=14, "Abab", the first two for AB, the latter two for AB, the same, the first three bit ABA, the end three for the Bab, So the maximum length is 2, namely p[4]=25, "Ababa", the first three bit ABA, the end three are also ABA, the first four is Abab, the end four is Baba, different, so the maximum length is 3, that is p[5]=3 and so on, you can get the array p[j] find out p[j] After that, you can match according to P[j], or the above a, B for example, the matching process used in a number of variable pattern to represent B, Target indicates that Aheadindex points to the first character of a substring that matches B in a, Targetindex points to the index of the character that is matching B in a. Patternindex the index of the character in B that is being matched in B targetindex equals the number of digits to the right and Patternindex, that is, targetindex=headindex-1+patternindex the first step, this time, patternindex= 1,targetindex= 1,headindex=1 at this time pattern[patternindex] = = Target[targetindex], Then Patternindex and Targetindex add one, and then compare whether the same until Targetindex and Patternindex 6, Pattern[patternindex]! = Target[targetinDex] At this point, you need to move B to the right, for the next match, how much better to move it? This needs to be calculated according to P[j] because at this time,The
Ababa in front of Patternindex has been matched, p[5]=3, the length of the preceding string Ababa is 5, so the string pattern moves to the right by 5-3=2, that is, the pattern moves right to 3, that is, the new headindex= Headindex+2=3, and the new patternindex=p[5]+1=4, the new Patternindex, points to the fourth place in string B, targetindex=headindex+patternindex-1=3+4-1=6, So when the situation after the move, at this time pattern[patternindex] = = Target[targetindex], and then Patternindex and Targetindex add one, and then compare whether the same
when Patternindex equals 6,targetindex equals 8 o'clock, Pattern[patternindex]! = Target[targetindex], and move the B string to the right, at this time,
P[5]=3, the front of the Ababa has been matched, the length is 5, so the right to move the number of bits for 5-3=2, at this time, headindex=headindex+2=3+2=5,patternindex=p[5]+1=4, pointing to the fourth bit in string B, Targetindex=headindex+patternindex-1=5+4-1=8, so targetindex points to the eighth bit in a string.,
at this time Pattern[patternindex]! = Target[targetindex], but also to move the B string to the right, at this time the previous ABA has been matched successfully, the length of 3,p[3]=1, so to the right to move the length of 3-1=2, moving two bits, at this time, headindex=headindex+2=5+2=7, Patternindex=p[3]+1=1+1=2 points to the second digit in string B, targetindex=headindex+patternindex-1=7+2-1=8, and points to the eighth bit of a string.,
at this time Pattern[patternindex]! = Target[targetindex], but also to move the B string to the right, when the front has matched the string of a, the length of 1,p[1]=0, to the right to move the number of bits for 1-p[1]=1-0=1;
at this point, headindex=headindex+1=7+1=8,patternindex=p[1]+1=1, pointing to the first bit of string B, targetindex=headindex+patternindex-1=8+1-1= 8, pointing to the eighth bit of a string,Once again, the match will be successful. Here is the C + + implementation of the KMP algorithm, a little bit of a problem
1 #ifndef __kmp__h__2 #define__kmp__h__3#include <string>4#include <vector>5 using namespacestd;6 7 classkmp{8 Public:9 //void static getNext (const string &str,vector<int> &vec);Ten intKMP (); One KMP () {} AKMP (Const string&target,Const string&pattern): Mtarget (target), Mpattern (pattern) {} - voidSettarget (Const string&target); - voidSetpattern (Const string&pattern); the Private: -vector<int>MVec; - stringMtarget; - stringMpattern; + voidGetNext (); - }; + #endif
Here is the source code implementation
1#include"KMP.h"2#include <iostream>3#include <vector>4 using namespacestd;5 6 7 //gets the length of the same subset in all substrings of the string str8 //For example, String ABABACB, get the string a,ab,aba,abab,ababa,ababac,ababacb, respectively D9 //maximum length of the same substring as the front and back, such asTen //A: Because AA is a single character, the maximum length of the first and last identical substrings is A0 One //ABA, the first A and the last meta A is the same, so the value is a1,abab first 2 AB and the last two ab the same, the value is A2 A //Ababa first 3 is Aaba, the last 3 is Aaba, so the value is A3 - voidKmp::getnext () - { theMvec.clear ();//clear? EC - //vec.push_back (0);//for a It is easy to use, the first data of VEC is not used -Mvec.push_back (0);//The next position of the first character must be 0, such as "ABABACB", and the value of the first character a is 0 - string:: Const_iterator start =Mpattern.begin (); + string:: Const_iterator pos = start +1; - while(POS! =mpattern.end ()) + { A stringSUBSTR (start,pos+1);//Get substring at intStrLen = Substr.size ()-1;//gets the maximum length of the same substring in a substring - Do - { - stringPrefix (SUBSTR,0, StrLen);//gets the front strlen subset of D in Substr - stringPostfix (Substr,substr.size ()-strlen,strlen);//get the front of D in substr? Trlen subset - if(prefix = =postfix) in { - Mvec.push_back (strLen); to Break; + } ---StrLen; the /if the length of the same subset is less than one * /description does not have the same, then put 0 pressure stack $ if(StrLen <1)Panax NotoginsengMvec.push_back (0); -} while(StrLen >0); the +++Pos; A } the } + - voidKmp::setpattern (Const string&pattern) $ { $Mpattern =pattern; - } - the voidKmp::settarget (Const string&target) - {WuyiMtarget =Target; the } - Wu - About $ intKMP::KMP () - { -GetNext ();//first Get Next data - intTargetindex =0; A intPatternindex =0; + intHeadindex =0;//index of the first element that points to the target that matches the pattern the while(Patternindex! = mpattern.size () && targetindex! =mtarget.size ()) - { $ for(inti =0; I < mpattern.size ()-1;++i) the { the if(Mpattern[patternindex] = =Mtarget[targetindex]) the { the++Patternindex; -++Targetindex; in if(mpattern.size () = = Patternindex)//if the match has been successful, exit the loop the Break; the } About Else the { the if(0= = Patternindex)//if the first character does not match, move the mtarget to the left one the++Headindex; + Else - { theHeadindex + = patternindex-mvec[patternindex-1];//since the vector index is zero-based, subtract aBayiPatternindex = mvec[patternindex-1];//Update Patternindex Index the } theTargetindex = Headindex + patternindex;//index with new Targetindex - Break; - } the the } the } the - returnHeadindex; the}
KMP algorithm matching principle and C + + implementation