Since the principle is a bit complicated, the detailed principle can refer to this article http://blog.csdn.net/v_july_v/article/details/7041827
This article directly from the conclusion, to meet the exam and competition enough.
Set T as the target string ("Aaabbbaabbabcabcabbaba") and Pat as the pattern string ("Aabbabc").
This is the next array of pattern strings:
J (Subscript) |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
Pat |
A |
A |
B |
B |
A |
B |
C |
NEXT[J] |
-1 |
0 |
1 |
0 |
0 |
1 |
0 |
KMP algorithm:
J=0, Next[j]=-1. Indicates that the first 1 characters of the pattern string are aligned with the location of the last mismatch of the target string when the next match is compared. (In fact, the No. 0 character is aligned with the next position of the last mismatch of the target string), the pattern string needs to be moved to the post-next[j] position. (Post is the T-string subscript)
J=1, next[j]=0. Indicates that the No. 0 character of the pattern string is aligned with the location of the last mismatch of the target string when the next match is compared. The pattern string needs to be moved to POST-NEXT[J] locations.
j=2, Next[j]=1. Indicates that the 1th character of the pattern string is aligned with the location of the last mismatch of the target string when the next match is compared. The pattern string needs to be moved to POST-NEXT[J] locations.
etc...
So the following only requires the next array, how is the next array formed?
Starting at subscript 0, until LengthP-1 (LENGTHP is the length of the pattern string), each time the subscript is searched for the same maximum length as the suffix (the prefix does not include the entire string before it, that is, the starting position and the terminating position are equal to the same string, explained below).
J=0, the character A is not preceded by a character, so mark-1;
J=1, the character a precedes the character a, but because the "prefix does not include the preceding whole string" rule, it does not have the same prefix, so it is marked as 0.
J=2, the character B is preceded by a character AA, and the same string as the prefix is a, so the length of the prefix is 1.
etc... (PS: The calculation of the prefix is left to right)
In fact, this is to facilitate the understanding of the next array, and the actual formation of the next array is also a KMP algorithm, it is also a matching string process, with the suffix to match the process of the prefix.
The code is as follows:
1#include <iostream>2#include <string>3 using namespacestd;4 stringT;5 stringPat;6 voidGetNext (intNext[],intLENGTHP) {//LENGTHP is the length of the pattern string P7 intj=0, k=-1;//J is the subscript for the P-string, and K is used to record the value of the next array corresponding to the subscript.8next[0]=-1;//The next array value under initialization 0 subscript is-19 while(J<LENGTHP) {//Scan a pattern stringTen if(k==-1|| Pat[j]==pat[k]) {//The string suffix does not have the same substring as the prefix or the character under the J subscript and the word typeface under K. Onej++;k++; ANext[j]=k;//Set Next array J below value to K -}Else -K=NEXT[K];//narrowing the range of substrings continues to compare the } - } - - intKmpintKintnext[]) { + intposp=0, post=k;//The Posp and Post are the subscripts of the pattern string pat and the target string T, first initializing their starting position - intLengthp=pat.length ();//LENGTHP is the pattern string pat length + intLengtht=t.length ();//lengtht is the target string T length A while(POSP<LENGTHP&&POST<LENGTHT) {//for two-string scanning at if(posp==-1|| Pat[posp]==t[post]) {//corresponding character matching -posp++;p ost++; -}Else -POSP=NEXT[POSP];//when mismatched, select the next matching position with the next array value - } - if(POSP<LENGTHP)return-1; in Else returnPOST-LENGTHP;//Match Success - } to + intMain () { -t="Aaabbbaabbabcabcabbaba"; thepat="AABBABC"; * intlengthp=pat.length (); $ intnext[lengthp]={0};Panax Notoginseng GetNext (NEXT,LENGTHP); - intPOS=KMP (0, next); thecout<<pos<<Endl; +cout<<"next[]:"; A for(intI=0; i<lengthp;i++){ thecout<<next[i]<<" "; + } - return 0; $}
KMP string pattern matching algorithm (c + + implementation)