KMP algorithm Detailed

Source: Internet
Author: User

KMP algorithm detailed 1, violent matching (BF algorithm)

Suppose there is a text string s, and a pattern string p, to find the position of P in s?

With a violent matching mentality, assuming that the text S string matches to the I position, the pattern string P matches to the J position, then:

    • If s[i] = = P[j], then i++,j++, continue to match;
    • If S[i]!=p[j], then i = i-(j-1), that is, the matching failure i backtracking, J reset to 0.

The code is as follows:

intViolentmatch (Char* S,Char*p) {intSlen =strlen (s); intPlen =strlen (P); inti =0; intj =0;  while(I < Slen && J <Plen) {          if(S[i] = =P[j]) {              //if the current character matches successfully (that is, s[i] = = P[j]), then i++,j++i++; J++; }          Else          {              //if mismatch (i.e. s[i]! = P[j]), make i = i-(j-1), j = 0i = I-j +1; J=0; }      }      //The match succeeds, returns the position of the pattern string p in the text string s, otherwise returns-1    if(J = =Plen)returnIJ; Else          return-1; }
2, KMP algorithm 2.1, KMB algorithm ideas and code

The KMP algorithm is used to solve the problem, assuming that the text string s matches to the I position, the pattern string P matches to the J position;

    • If j = =-1, or s[i] = = P[j], then i++,j++, continue to match;
    • If j!=-1, and S[i]!=p[j], then i = i-(j-1), then I do not change, j = next[j]. That is, when a match fails, the number of bits that the pattern string moves to the right is: The mismatch character's location-the next value corresponding to the mismatch character (the actual number of moves: J-next[j])

The code is as follows:

intKmpmatch (Char* S,Char*p) {inti =0; intj =0; intSlen =strlen (s); intPlen =strlen (P);  while(I < Slen && J <Plen) {          //If J =-1, or if the current character matches successfully (that is, s[i] = = P[j]), make i++,j++        if(j = =-1|| S[i] = =P[j]) {i++; J++; }          Else          {              //if J! =-1, and the current character match fails (that is, s[i]! = P[j]), then I is unchanged, j = Next[j]//That is, the character mismatch, the pattern string jumps to next [j] Positionj =Next[j]; }      }      if(J = =Plen)returnIJ; Else          return-1; }  
2.2next Array Derivation

Next array the meanings of each value: the same prefix suffix that represents the length of the string that precedes the current character. That is, next [j] = k, the string representing the preceding J has the same prefix suffix with a maximum length of K .

    • Prefix suffix longest common element length
      • prefix max = suffix max, length at this time
      • For example, the pattern string is abcabd, and the prefix suffix is the largest common element table, as follows:

For string abcab, it has the same prefix suffix ab with a maximum length of 2.

    • The next array is derived from the longest common element length of the prefix suffix.
      • That is, the prefix suffix of the string preceded by the J pointer is the longest of the common element length = Next[j]
      • For example, the pattern string is the Abcabd,next array value, as follows:

For the character d, the preceding string is Abcab, and for string Abcab it has the same prefix suffix ab with a maximum length of 2, so d corresponds to the next value of 2.

Comparison of the above two tables

It can be found that thenext value is the value of the longest common element length of the prefix suffix is shifted one bit to the right, then the initial values are assigned to-1.

2.3 Match according to next array

Next Array function: When a character in a pattern string matches a character in a text string mismatch, tell the pattern string where to jump next. For example, when the character at J in the pattern string matches the character mismatch in the text string at I, the next character at Next [J] continues to match the character at the text string i.

    • Matching of maximum common element length value based on prefix suffix
      • Mismatch, the number of bits that the pattern string moves to the right is: matched number of characters-maximum length value for the previous character of the mismatch character
    • Match based on next value
      • Mismatch, the number of bits to the right of the pattern string is: where the mismatch character is located-the next value corresponding to the mismatch character

Briefly summarizing, when counting from 0, the position of the mismatch character = the number of characters that have been matched, and the next value of the mismatch character = The maximum length of the previous character of the mismatch character.

Code derivation for 2.4 next array

Can be deduced according to the mathematical inductive method:

    1. Initialize Next[0] =-1;
    2. For value K, there are p0 P1, ..., pk-1 = Pj-k pj-k+1, ..., pj-1, equivalent to next[j] = k, that is, the pattern string substring preceding p[j], with the same prefix and suffix of length k.
    3. Solve Next[j + 1] =? There are two kinds of situations to discuss:
      • if p[k] = = P[j], then next[j + 1] = next [j] + 1 = k + 1 ;
      • If P[K]! = P[j], the pattern string substring preceding p[j + 1] does not have the same prefix suffix with the length k + 1 and can only be searched for the same prefix suffix with a shorter length, i.e., k = next[k] recursively;
        • If p[k] = = p[j],next[j + 1] = k +1 (description, at this point the value of K has been updated, that is, k = next[k]);
        • If p[k]! = P[j], then recursive .

The code is as follows:

voidGetNext (Char*p,int* &next) {Next= (int*)malloc(Strlen (P) *sizeof(int)); if(!next) Exit (-1); intj =0; intK =-1; next[0]  = -1;  while(J < strlen (p)-1{//p[k] represents a prefix, p[j] represents a suffixif(k = =-1|| P[J] = =P[k]) {next[++J] = + +K; }Else{k=Next[k]; }            }}

Optimization of 2.5 Next array

Based on the code derivation of the next array, we will find that the code is not optimized for p[next[j]] = = P[j] .

    • NEXT[J] = k;
    • P[NEXT[J]] = = P[j] (p[k] ==p[j])------------>next[j + 1] = k +1; (original next function processing mode)
    • P[NEXT[J]] = = P[j] (p[k] ==p[j])------------>next[j] = next[j] = Next[next[j]] (k = next[k]); (optimized post-processing mode)

For example, if you use the previous next array method to find the next array of "ABAB" of the mode string, the next array is 1 0 0 1 (0 0 1 2 overall right one bit, the initial value is assigned to-1), when it matches the text string to match, found B and C mismatch, then the pattern string sliding to next[3] = 1 Position, that is, move right j-next[j] = 3-1 = 2 bits.

After moving the 2-bit right, B is also mismatch with C. In fact, because in the previous step of the match, it has been learned that p[3] = B, and s[3] = C mismatch, and the right to move two bits, let p[next[3]] = p[1] = B and then s[3] match, inevitably mismatch.

So should not appear p[next[j]] = = P[j], if it appears, you need to recursion again, even if next[j] = next[Next[j]].

The next array of "ABAB" of the next array is optimized, and the next array can be 1 0-1 1.

As in the example above, s[3] and P[3] match failed, s[3] remains the same, P's next match position is p[next[3]], and next[3]=0, so p[next[3]]=p[0] and s[3] match,

Match success

The optimization code is as follows:

voidGetNext (Char*p,int* &next) {Next= (int*)malloc(Strlen (P) *sizeof(int)); if(!next) Exit (-1); intj =0; intK =-1; next[0]  = -1;  while(J < strlen (p)-1){        if(k = =-1|| P[J] = =P[k]) {               if(P[++j] = = p[++K]) {// cannot appear p[j] = = P[next[j]], if present, need to continue recursion Next[j]=Next[k]; }Else{Next[j]=K; }            }Else{k=Next[k]; }            }}                    

Complete code is included:

#include <stdio.h>#include<stdlib.h>#include<string.h>intKmpmatch (Char* S,Char* p,int*next);voidGetNext (Char* p,int* &next);intMain () {int*Next; Charstrtxt[ $]; printf ("Please enter your body: \ n");    Gets (Strtxt); Charstrkey[ -]; printf ("Please enter the substring you are looking for: \ n");    Gets (strkey);        GetNext (strkey, next); printf ("Next evaluation result of substring: \ n");  for(unsignedinti =0; I <= strlen (strkey)-1; ++i) {printf ("%5c", Strkey[i]); } printf ("\ n");  for(unsignedinti =0; I <= strlen (strkey)-1; ++i) {printf ("%5d", Next[i]); } printf ("\ n"); intpos =Kmpmatch (Strtxt, strkey, next); if(POS! =-1) printf ("character match, match point is%d\n", POS); Elseprintf ("character mismatch! \ n"); System ("Pause"); return 0;}intKmpmatch (Char* S,Char* p,int*next) {    inti =0; intj =0; intSlen =strlen (s); intPlen =strlen (P);  while(I < Slen && J <Plen) {                        if(j = =-1|| S[i] = =P[j]) {i++; J++; }Else{J=Next[j]; }                }        if(j = = Plen)returnIPlen; Else return-1;}voidGetNext (Char*p,int* &next) {Next= (int*)malloc(Strlen (P) *sizeof(int)); if(!next) Exit (-1); intj =0; intK =-1; next[0]  = -1;  while(J < strlen (p)-1){                if(k = =-1|| P[J] = =P[k]) {            if(P[++j] = = p[++K]) {Next[j]=Next[k]; }Else{Next[j]=K; }            }Else{k=Next[k]; }            }        }

KMP algorithm Detailed

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.