(a) Gets the next array value of the pattern string T 1. Review what we know about the role of the next array of KMP algorithms
NEXT[J] indicates that when the J subscript of the current pattern string T is mismatched with the I value of the target string s, we should use the subscript of the pattern string as next[j] and then match with the I value of the target string mismatch
and the next evaluation function of the KMP algorithm
We can know that next except for J=1, Next[1] is 0, and the other is comparing the similarity between the prefix and the suffix string (the third case is when the similarity is 0 o'clock and the next value is 0+1=1)
The next array is used to judge the acquaintance of the prefix, and the next value is equal to the similarity plus a
2. Thinking
Although we know the similarity between the prefixes, we can determine the prefix position to get the next value. ---- PJ depends on Prefix p1p2....pk-1 suffix pj-k+1.....pj-1 similarity, next value is similarity plus one
The next value of PJ depends on the prefix p1p2....pk-1 suffix pj-k+1... pj-1 Similarity, is the similarity plus one. we will k -1=m, where M is the similarity, K is the next array value-->max{k}PJ's next value depends on the prefix p1p2....pm suffix PJ similarity, which is the similarity plus one.
So our task now is to find k-1 to find M, look for similarity
For example: Although we can directly see that the similarity of ABAB is 2, you can write a function to get its similarity, and when we ask for the next next value, the string becomes Ababa, we can also see that the similarity is 3, using the same function can achieve the acquisition of similarity. But our function is probably starting from the beginning or the end of the index, to judge. Every time we get to the substring to give this function to the index to get the similarity, it seems not cost-effective, we should have a better way to increase the performance of the program?
3. Below we try to get all of the next values of the T string below and find the association from
Step one: From the previous post can be known before j1,j2 the first two next is fixed value for 0,1
Step two: Get j=3 when the next, at this time the substring only ' ab ', so the substring prefix can only select ' a ', suffix can only select ' B '; below we match the prefix
The next array is used to judge the acquaintance of the prefix, and the next value is equal to the similarity plus a
NEXT[J] indicates that when the J subscript of the current pattern string T is mismatched with the I value of the target string s, we should use the subscript of the pattern string as next[j] and then match with the I value of the target string mismatch
Note: after the match is complete, the suffix adds a
Step Three:gets the next value when j=4, at which time the substring is ' ABA ' and the substring is prefixed with P1. PM, suffix is pm+1..pj-1, if M take one, at this time the prefix of the substring can choose P1, suffix select p2; if m=2 prefix select p1p2 suffix select p2p3, then how to choose this m-value?
Emphasis: This m-value depends on the next[] value of the last mismatch, that is, the last j=3 was mismatch, all m=next[3]=1, so we chose the prefix p1= ' a ', the suffix pj-1 is ' a '
Step four: Get j=5 when the next value, at this time the substring is ' abab ', the substring prefix is p1. PM, suffix is pm+1..pj-1, if M take one, at this time the prefix of the substring can choose P1, suffix select p2; if m=2 prefix select p1p2 suffix select P2p3, if M takes 3, the prefix is p1p2p3 suffix p2p3p4; So how do you choose this m-value?
Focus: If the last match was successful. is not mismatch, our m-value is added 1 on the previous basis. So this time m=2, we choose prefix p1p2 and suffix p3p4
Step five: Get j=6 when the next value, at this time the substring is ' Ababa ', the substring prefix is p1. PM, suffix is pm+1..pj-1, because the previous match succeeds, all m++,m=3 are prefixed with p1p2p3, suffix p3p4p5
Because the previous match succeeds, all m++,m=3 are prefixed with p1p2p3, and the suffix is p3p4p5
Step six: Get j=7 when the next value, at this time the substring is ' Ababaa ', the substring prefix is p1. PM, suffix is pm+1..pj-1, because the previous match succeeds, all m++,m=4 are prefixed with p1p2p3p4, suffix p3p4p5p6
Step Seven: Get j=8 when the next value, at this time the substring is ' ababaaa ', because the above mismatch, so m=next[7]=2, so we prefix p1p2, suffix p6p7
Because the above mismatch, so m=next[7]=2, matching prefix p1p2, and suffix p6p7
Step Seven: Get j=9 when the next value, at this time the substring is ' Ababaaab ', because the above mismatch, so m=next[8]=2, so we prefix p1p2, suffix p7p8
Because the above mismatch, so m=next[8]=2, so we prefix p1p2, suffix p7p8
Note: It is possible for the pattern string to match only one character, then we have said next[2]=1 also need us to match again, rather than get the result directly
4. Code implementation
//by calculating the next array that returns the substring TvoidGet_next (String T,int*next) { intM, J; J=1;//j is the suffix at the end of subscript pj-m...pj-1 actually j-1 is the suffix subscript, and J is our request next array subscriptm =0;//m represents the subscript at the end of the prefix p1p2...pmnext[1] =0; while(J < t[0]) //t[0] is the length that represents the string T {
This if, we just need to consider if I < suffix last subscript > previous match succeeds, now I t[j]==t[m] also match success, then my corresponding next<++j> array value? if(M = =0|| T[m] = = T[j])//T[m] Represents the end character of the prefix, T[j] is the end character of the suffix { ++m; ++J; NEXT[J]=m; ++j after the acquisition is what we want next[j] subscript}Else//else is the case of a match failure, it is necessary to backtrackm= Next[m];//if the characters are not the same, the M backtracking }}
5. Test results
intMain () {inti; String S1; intNext[maxsize] = {0 }; Char*str = (Char*) malloc (sizeof(Char) * +); memset (str,0, +); printf ("Enter S1:"); scanf ("%s", str); if(!strassign (S1, str)) printf ("1.string length is GT%d\n", MAXSIZE); Elseprintf ("1.string strassign success\n"); Get_next (S1, next); for(i = 1; I <= stringlength (S1); i++) printf ("%d", Next[i]); System ("Pause"); return 0;}
Data Structure (iii) the acquisition of KMP pattern matching algorithm in string---next array