Data Structure (iii) the acquisition of KMP pattern matching algorithm in string---next array

Source: Internet
Author: User

(a) Gets the next array value of the pattern string T 1. Review what we know about the role of the next array of KMP algorithms
NEXT[J] indicates that when the J subscript of the current pattern string T is mismatched with the I value of the target string s, we should use the subscript of the pattern string as next[j] and then match with the I value of the target string mismatch
and the next evaluation function of the KMP algorithm

We can know that next except for J=1, Next[1] is 0, and the other is comparing the similarity between the prefix and the suffix string (the third case is when the similarity is 0 o'clock and the next value is 0+1=1)
The next array is used to judge the acquaintance of the prefix, and the next value is equal to the similarity plus a
2. Thinking
 Although we know the similarity between the prefixes, we can determine the prefix position to get the next value. ---- PJ depends on     Prefix p1p2....pk-1 suffix pj-k+1.....pj-1 similarity, next value is similarity plus one   
The next value of PJ depends on the prefix p1p2....pk-1 suffix pj-k+1... pj-1   Similarity, is the similarity plus one. we will k   -1=m, where M is the similarity, K is the next array value-->max{k}PJ's next value depends on the prefix p1p2....pm suffix PJ  similarity, which  is the similarity plus one.  
So our task now is to find k-1 to find M, look for similarity

For example: Although we can directly see that the similarity of ABAB is 2, you can write a function to get its similarity, and when we ask for the next next value, the string becomes Ababa, we can also see that the similarity is 3, using the same function can achieve the acquisition of similarity. But our function is probably starting from the beginning or the end of the index, to judge. Every time we get to the substring to give this function to the index to get the similarity, it seems not cost-effective, we should have a better way to increase the performance of the program?

3. Below we try to get all of the next values of the T string below and find the association from

Step one: From the previous post can be known before j1,j2 the first two next is fixed value for 0,1

Step two: Get j=3 when the next, at this time the substring only ' ab ', so the substring prefix can only select ' a ', suffix can only select ' B '; below we match the prefix
The next array is used to judge the acquaintance of the prefix, and the next value is equal to the similarity plus a
NEXT[J] indicates that when the J subscript of the current pattern string T is mismatched with the I value of the target string s, we should use the subscript of the pattern string as next[j] and then match with the I value of the target string mismatch

Note:  after the match is complete, the suffix adds a
Step Three:gets the next value when j=4, at which time the substring is ' ABA ' and the substring is prefixed with P1. PM, suffix is pm+1..pj-1, if M take one, at this time the prefix of the substring can choose P1, suffix select p2; if m=2 prefix select p1p2 suffix select p2p3, then how to choose this m-value?

Emphasis: This m-value depends on the next[] value of the last mismatch, that is, the last j=3 was mismatch, all m=next[3]=1, so we chose the prefix p1= ' a ', the suffix pj-1 is ' a '

Step four: Get j=5 when the next value, at this time the substring is ' abab ', the substring prefix is p1. PM, suffix is pm+1..pj-1, if M take one, at this time the prefix of the substring can choose P1, suffix select p2; if m=2 prefix select p1p2 suffix select P2p3, if M takes 3, the prefix is p1p2p3 suffix p2p3p4; So how do you choose this m-value?

Focus: If the last match was successful. is not mismatch, our m-value is added 1 on the previous basis. So this time m=2, we choose prefix p1p2 and suffix p3p4

Step five: Get j=6 when the next value, at this time the substring is ' Ababa ', the substring prefix is p1. PM, suffix is pm+1..pj-1, because the previous match succeeds, all m++,m=3 are prefixed with p1p2p3, suffix p3p4p5

Because the previous match succeeds, all m++,m=3 are prefixed with p1p2p3, and the suffix is p3p4p5

Step six: Get j=7 when the next value, at this time the substring is ' Ababaa ', the substring prefix is p1. PM, suffix is pm+1..pj-1, because the previous match succeeds, all m++,m=4 are prefixed with p1p2p3p4, suffix p3p4p5p6

Step Seven: Get j=8 when the next value, at this time the substring is ' ababaaa ', because the above mismatch, so m=next[7]=2, so we prefix p1p2, suffix p6p7

Because the above mismatch, so m=next[7]=2, matching prefix p1p2, and suffix p6p7

Step Seven: Get j=9 when the next value, at this time the substring is ' Ababaaab ', because the above mismatch, so m=next[8]=2, so we prefix p1p2, suffix p7p8

Because the above mismatch, so m=next[8]=2, so we prefix p1p2, suffix p7p8

Note: It is possible for the pattern string to match only one character, then we have said next[2]=1 also need us to match again, rather than get the result directly
4. Code implementation
//by calculating the next array that returns the substring TvoidGet_next (String T,int*next) {    intM, J; J=1;//j is the suffix at the end of subscript pj-m...pj-1 actually j-1 is the suffix subscript, and J is our request next array subscriptm =0;//m represents the subscript at the end of the prefix p1p2...pmnext[1] =0;  while(J < t[0]) //t[0] is the length that represents the string T {
This if, we just need to consider if I < suffix last subscript > previous match succeeds, now I t[j]==t[m] also match success, then my corresponding next<++j> array value? if(M = =0|| T[m] = = T[j])//T[m] Represents the end character of the prefix, T[j] is the end character of the suffix { ++m; ++J; NEXT[J]=m; ++j after the acquisition is what we want next[j] subscript}Else//else is the case of a match failure, it is necessary to backtrackm= Next[m];//if the characters are not the same, the M backtracking }}
5. Test results
intMain () {inti;    String S1; intNext[maxsize] = {0 }; Char*str = (Char*) malloc (sizeof(Char) * +); memset (str,0, +); printf ("Enter S1:"); scanf ("%s", str); if(!strassign (S1, str)) printf ("1.string length is GT%d\n", MAXSIZE); Elseprintf ("1.string strassign success\n");    Get_next (S1, next);  for(i = 1; I <= stringlength (S1); i++) printf ("%d", Next[i]); System ("Pause"); return 0;}

Data Structure (iii) the acquisition of KMP pattern matching algorithm in string---next array

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.