Next function solution and analysis process of KMP algorithm

Source: Internet
Author: User

Turn from

wang0606120221:http://blog.csdn.net/wang0606120221/article/details/7402688

Assuming that the pattern string in the KMP algorithm is P and the main string is s, then the core of the algorithm is to compute the next function of P of the pattern string.

The KMP algorithm is matched on the basis of the next function value of the known pattern string.

Because this time only discusses the evaluation process of next, the mathematical reasoning process of the KMP algorithm is not explained here.

From the mathematical reasoning of the KMP algorithm, it is determined that this next function depends only on the characteristics of the pattern matching string itself and the main string without any relation, this function

The default is that next[1]=0, because the meaning of next[j]=k is that when the pattern string and the J-character of the main string do not match, then the next and the main string of the first J

Characters that match a character are the K characters of a pattern string. As a result, next[1]=0 indicates that when the current character of the main string does not match the 1th character of the pattern string, then the

Down needs to match the No. 0 character of the pattern string with the current character of the main string, since the pattern string subscript is starting from 1, it is impossible to have a No. 0 character,

The next matching action is that the main string and the pattern string move to the right one bit at the same time, continuing the pattern matching.

Example: Main string: A C a B a a b a a B n a C

Pattern string: A B a a B

Main string: A C a B a a b a a B n a C

Pattern string: A B a a B

Main string: A C a B a a b a a B n a C

Pattern string: A B a a B

At this point, the main string and the pattern string do not match, and the next[1]=0, so the No. 0 character of the pattern string is compared to the 2nd character of the main string, and the pattern string does not have a No. 0

Character, the No. 0 character can be understood as a null character, that is, the pattern string moves to the right one bit, and the main string continues to drink pattern string matching, while the main string at this time

The first character is the 3rd character, and the overall view is that when the 1th character of the main string and the pattern string does not match, the main string and the pattern string move right one bit at a time, and then continue to match.

Next, we'll explain the next function value solving process under normal circumstances.

Set Next[j]=k, according to the characteristics of the KMP algorithm pattern string is known, ' p1p2......pk-1 ' = ' pj-k+1......pj-1 ', where K must meet 1<k<j, and it is not possible to have K ' >k satisfy the above equation. Can I calculate the value of Next[j+1] based on next[j]=k? Obviously can calculate out, otherwise I also don't nonsense, hehe.

The specific solution process is discussed below:

When you know the value of Next[j]=k, next[j+1], there are two cases, one is PK=PJ, the other is PK!=PJ.

1. When PK=PJ, it can be concluded that there is a ' p1......pk ' = ' pj-k+1......pj ' substring equation in the pattern string. And there is no possibility of K ' >k, satisfying

The "p1......pk '" = "pj-k ' +1......pj" equation (in which the double quotation marks are used only to distinguish between the quotation marks of the outer string and the inner K ' quotes, there is no other intention, as described here), because K is the largest according to the KMP algorithm.

Therefore, it is obvious that next[j+1]=k+1=next[j]+1. This formula means that when the current character of the main string does not match the j+1 character of the pattern string, the k+1 character of the pattern string and the current main string character are compared to match, that is, the current character of the main string is unchanged, the character of the pattern string is changed, and it can be understood that the pattern string moves to the right j+1-(k+1)

2. When PK!=PJ, it is known that the ' p1......pk ' = ' pj-k+1......pj ' equation does not exist in the pattern string. Indicates that it is not easy to compare the k+1 characters of the pattern string with the main string, because PK!=PJ, it is necessary to move the pattern string to the right by more digits until the first m characters of the pattern string are found to match the M characters in the main string, or if no such substring is found. Then it means that the pattern string needs to start from the 1th character and the main string to new match.

Main string: A C A B a A c a B a a b a a B n a C

Pattern string: A B a a B

Main string: A C A B a A c a B a a b a a B n a C

Pattern string: A B a a B

The value of the next function is given here, so it is not correct to use the PK=PJ method in order to make it easier to illustrate the PK!=PJ.

next[1]=0,next[2]=1,next[3]=1,next[4]=2,next[5]=2.

At this time, i=7,j=5, and s[7]!=t[5], if simple according to PK=PJ time method, with the first k+1=2+1=3 characters and the main string of the 7th character comparison, obviously not, because s[6]=a!=b=t[2]. The front face string does not match, so matching the following characters is obviously a joke. Note that the pattern string needs to move more digits to the right, knowing that the appropriate character is found or not, and that it needs to be re-matched to the main string from the 1th character.

Since the next[j]=k is known, so the p1=pj-k+1,p2=pj-k+2,......,pk-1=pj-1 in the pattern string, the pattern string should continue to move to the right until the m+1 character, which satisfies ' p1......pm ' = ' pj-m+ 1......PJ ', (1<m<k<j) and does not exist M ' >m also satisfies the equation. You can now compare matches with the m+1 character and the current main string character (assuming the character index of the current main string is i+1)

In the main string there must be a relationship ' si-m+1......si ' = ' pj-m+1......pj ' = ' p1......pm ', so it is correct to compare the m+1 character of the pattern string with the current main string character. At this time next[j+1]=m+1=next[......next[next[k]]]+1.

M=NEXT[......NEXT[NEXT[K]] [here]], as explained here M=next[......next[next[k]], because of PK!=PJ, so you need to move the pattern string to the right,

First move the next[k] character compared to the first J character, assuming Next[k]=h, if PH=PJ,

The existence of the equation ' p1......ph ' = ' p-h+1......pj ' in the pattern string, (1<H<K<J) (assuming that the current and pattern string comparison characters in the main string are i+1) then the main string must exist ' si-h+1......si ' = ' PJ-H+1......PJ ' = ' p1......ph ' equation (1

That is next[j+1]=next[k]+1.

Similarly, if PH!=PJ, then the pattern string also needs to continue to move to the right, compared with the next[h] character and the first J character, and so on, until the first J character and a character in the pattern string match successfully or do not exist U (1<u<j) satisfies the equation ' p1......pu ' = ' PJ-U+1......PJ '. If you find the match character U, then

Next[j+1]=u+1, otherwise next[j+1]=1.

The following example illustrates the process: The next function value is only related to the characteristics of the pattern string itself;

Index of the pattern string: J 1 2 3 4 5 6 7 8

Pattern string: A B A a B c a C

Next value: 0 1 1 2 2 3 1 2

Default setting next[1]=0;

Calculation next[2]: Because p1......p2-1, cannot exist index K, satisfies 1<k<2, therefore next[2]=1;

Calculation next[3]: because p3-1=b! =a=p1,next[2]=1, and next[1]=0, so does not exist u, satisfies the above expression, next[3]=1;

Calculate next[4]:next[3]=1,p4-1=a=p1,next[4]=next[3]+1=1+1=2;

Calculate next[5]:next[4]=2,p2=b,p5-1=a! =P2,PJ=P4=A,NEXT[2]=1,P1=A=PJ, so next[5]=next[2]+1=1+1=2;

Calculate next[6]:next[5]=2,p2=b,p6-1=b=p2,next[6]=next[5]+1=2+1=3;

Calculate next[7]:next[6]=3,p3=a,p7-1=c! =p3,pj=p6=c,next[3]=1,p1=a! =pj,next[1]=0, so there is no u, so, next[7]=1;

Calculates the next[8]:next[7]=1,p1=a,p8-1=a=p1,next[8]=next[7]+1=1+1=2.

Why do I use the next[k] character to compare directly to the J-character when the J-character of a pattern string does not match the main string, rather than the j-1 character and the main string?

My proof process is as follows.

Proof: Because the precondition is next[j]=k, then can know ' p1.......pk-1 ' = ' p-k+1......pj-1 ', and K satisfies 1<k<j and does not exist h,1<k

Assuming that the main string index is i+1 and the pattern string is j+1, then the ' p1......pj-1 ' = ' p2......pj ' equation is definitely not tenable. Because if the equation is true, then there will be an equation ' p1......pj-2 ' = ' p2......pj-1 ' is established, and according to the preconditions of next[j]=k, we have concluded that there is no h,1<k

Therefore, all characters between Next[k] and J are conditions that cannot satisfy the equation that the algorithm seeks to satisfy the character. Therefore, if PK!=PJ, then the next must be used from 0 to Next[k] between the character and the first J character comparison match, so the algorithm first adopted from the Next[j] characters and the J character comparison match.

If I prove wrong, ask Daniel to tell me the correct answer, so that I can get the correct proof process. Thank you!

Finally, Next's function expression is as follows:

0,j=1;

next[j]= Max{k | ' P1......pk-1 ' = ' pj-k+1......pj-1 ', 1<k<j}, the collection is not empty;

1, other conditions.

Suppose Next[j]=k.

NEXT[J]+1=K+1,PJ=PK;

NEXT[J+1]=1,PK!=PJ, there is no character u,1<u<k, satisfies the equation ' p1......pu ' = ' PJ-U+1......PJ ';

NEXT[......NEXT[NEXT[K]]]+1,PK!=PJ, found the u,1<u<k, ' p1......pu ' = ' pj-u+1......pj '.

This blog is mainly about how to follow the idea of the ground to calculate all the next function, reduce each time scanning the entire pattern string calculation next value, through the recursive algorithm can be calculated by the previous Next[j] value can be calculated next[j+1], which can improve a lot of efficiency.

The next numeric function code for the KMP algorithm is as follows: where green lines represent code, others represent comments on each line of code.

void GetNext (String p,int[] next)

{

Next initializes the next[1]=0, because the maximum k value needs to be computed, so the 2nd character starts looking for a matching substring so that u satisfies the 1<u<j

' P1......pu-1 ' = ' pj-u+1......pj-1 ', this algorithm uses two index pointers, I and J, and initializes the i=1,j=0. In order to calculate the next//value, both the main string and the pattern string are the pattern string data used. Such as

Main string: A B A a B c a c,i=1;

Pattern string: A B A a B c a c,j=0.

int i=1; next[1]=0; j=0;

Linear Scan The main string, the main string does not return, can only increase, and the pattern string may continue to slide to the right, looking for the character U, make PU=PJ, solve//next[j+1].

while (I<=p.length)

{

Since the pattern string does not exist in the No. 0, so the j==0 does not match the main string and the pattern string, at this point, both the main string and the pattern string must be incremented by an index, and then//after the match operation continues. At this point the j=0+1=1,next[i]=1 is the same as the No. 0 character of the pattern string when the main string is compared with the 1th character of the pattern string and the main string match. Another j==0 condition is that you cannot find the character U in the pattern string, making pu=pj, at this point next[j] equals 1.

P[I]=P[J] Conditions represent the main string and pattern string matching, so both the main string and the pattern string need to be incremented by one character index, assuming that the main string index is I and the pattern string index is J, it can be concluded that ' p1......pj ' = ' pi-j+1......pi ', Because I and J are to add a character index, at this time the main string is i=i+1, the pattern string is j=j+1, so next[i]=j;

if (j==0| | P[i]==p[j]) {++i;++j;next[i]=j;}

When J!=0 and P[i]!=p[j], execute the following else code, which indicates that the pattern string needs to be moved to the right, that is, in order to make the next while loop compare by using the first character of the next[j] character and the I-match comparison in the main string.

else J=next[j];

}

When the loop ends, the next array holds all the values of the next initialization of the pattern string. That is, when the characters in the current pattern string do not match the values in the main string, the next step is to use which character to compare with the current character in the main string.

}

Next function solution and analysis process of KMP algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.