KMP algorithm Detailed

Last Update:2018-04-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

KMP algorithm detailed 1, violent matching (BF algorithm)

Suppose there is a text string s, and a pattern string p, to find the position of P in s?

With a violent matching mentality, assuming that the text S string matches to the I position, the pattern string P matches to the J position, then:

If s[i] = = P[j], then i++,j++, continue to match;
If S[i]!=p[j], then i = i-(j-1), that is, the matching failure i backtracking, J reset to 0.

The code is as follows:

intViolentmatch (Char* S,Char*p) {intSlen =strlen (s); intPlen =strlen (P); inti =0; intj =0;  while(I < Slen && J <Plen) {          if(S[i] = =P[j]) {              //if the current character matches successfully (that is, s[i] = = P[j]), then i++,j++i++; J++; }          Else          {              //if mismatch (i.e. s[i]! = P[j]), make i = i-(j-1), j = 0i = I-j +1; J=0; }      }      //The match succeeds, returns the position of the pattern string p in the text string s, otherwise returns-1    if(J = =Plen)returnIJ; Else          return-1; }

2, KMP algorithm 2.1, KMB algorithm ideas and code

The KMP algorithm is used to solve the problem, assuming that the text string s matches to the I position, the pattern string P matches to the J position;

If j = =-1, or s[i] = = P[j], then i++,j++, continue to match;
If j!=-1, and S[i]!=p[j], then i = i-(j-1), then I do not change, j = next[j]. That is, when a match fails, the number of bits that the pattern string moves to the right is: The mismatch character's location-the next value corresponding to the mismatch character (the actual number of moves: J-next[j])

The code is as follows:

intKmpmatch (Char* S,Char*p) {inti =0; intj =0; intSlen =strlen (s); intPlen =strlen (P);  while(I < Slen && J <Plen) {          //If J =-1, or if the current character matches successfully (that is, s[i] = = P[j]), make i++,j++        if(j = =-1|| S[i] = =P[j]) {i++; J++; }          Else          {              //if J! =-1, and the current character match fails (that is, s[i]! = P[j]), then I is unchanged, j = Next[j]//That is, the character mismatch, the pattern string jumps to next [j] Positionj =Next[j]; }      }      if(J = =Plen)returnIJ; Else          return-1; }

2.2next Array Derivation

Next array the meanings of each value: the same prefix suffix that represents the length of the string that precedes the current character. That is, next [j] = k, the string representing the preceding J has the same prefix suffix with a maximum length of K .

Prefix suffix longest common element length
- prefix max = suffix max, length at this time
- For example, the pattern string is abcabd, and the prefix suffix is the largest common element table, as follows:

For string abcab, it has the same prefix suffix ab with a maximum length of 2.

The next array is derived from the longest common element length of the prefix suffix.
- That is, the prefix suffix of the string preceded by the J pointer is the longest of the common element length = Next[j]
- For example, the pattern string is the Abcabd,next array value, as follows:

For the character d, the preceding string is Abcab, and for string Abcab it has the same prefix suffix ab with a maximum length of 2, so d corresponds to the next value of 2.

Comparison of the above two tables

It can be found that thenext value is the value of the longest common element length of the prefix suffix is shifted one bit to the right, then the initial values are assigned to-1.

2.3 Match according to next array

Next Array function: When a character in a pattern string matches a character in a text string mismatch, tell the pattern string where to jump next. For example, when the character at J in the pattern string matches the character mismatch in the text string at I, the next character at Next [J] continues to match the character at the text string i.

Matching of maximum common element length value based on prefix suffix
- Mismatch, the number of bits that the pattern string moves to the right is: matched number of characters-maximum length value for the previous character of the mismatch character
Match based on next value
- Mismatch, the number of bits to the right of the pattern string is: where the mismatch character is located-the next value corresponding to the mismatch character

Briefly summarizing, when counting from 0, the position of the mismatch character = the number of characters that have been matched, and the next value of the mismatch character = The maximum length of the previous character of the mismatch character.

Code derivation for 2.4 next array

Can be deduced according to the mathematical inductive method:

Initialize Next[0] =-1;
For value K, there are p0 P1, ..., pk-1 = Pj-k pj-k+1, ..., pj-1, equivalent to next[j] = k, that is, the pattern string substring preceding p[j], with the same prefix and suffix of length k.
Solve Next[j + 1] =? There are two kinds of situations to discuss:

- if p[k] = = P[j], then next[j + 1] = next [j] + 1 = k + 1 ;
- If P[K]! = P[j], the pattern string substring preceding p[j + 1] does not have the same prefix suffix with the length k + 1 and can only be searched for the same prefix suffix with a shorter length, i.e., k = next[k] recursively;
  - If p[k] = = p[j],next[j + 1] = k +1 (description, at this point the value of K has been updated, that is, k = next[k]);
  - If p[k]! = P[j], then recursive .

The code is as follows:

voidGetNext (Char*p,int* &next) {Next= (int*)malloc(Strlen (P) *sizeof(int)); if(!next) Exit (-1); intj =0; intK =-1; next[0]  = -1;  while(J < strlen (p)-1{//p[k] represents a prefix, p[j] represents a suffixif(k = =-1|| P[J] = =P[k]) {next[++J] = + +K; }Else{k=Next[k]; }            }}

Optimization of 2.5 Next array

Based on the code derivation of the next array, we will find that the code is not optimized for p[next[j]] = = P[j] .

NEXT[J] = k;
P[NEXT[J]] = = P[j] (p[k] ==p[j])------------>next[j + 1] = k +1; (original next function processing mode)
P[NEXT[J]] = = P[j] (p[k] ==p[j])------------>next[j] = next[j] = Next[next[j]] (k = next[k]); (optimized post-processing mode)

For example, if you use the previous next array method to find the next array of "ABAB" of the mode string, the next array is 1 0 0 1 (0 0 1 2 overall right one bit, the initial value is assigned to-1), when it matches the text string to match, found B and C mismatch, then the pattern string sliding to next[3] = 1 Position, that is, move right j-next[j] = 3-1 = 2 bits.

After moving the 2-bit right, B is also mismatch with C. In fact, because in the previous step of the match, it has been learned that p[3] = B, and s[3] = C mismatch, and the right to move two bits, let p[next[3]] = p[1] = B and then s[3] match, inevitably mismatch.

So should not appear p[next[j]] = = P[j], if it appears, you need to recursion again, even if next[j] = next[Next[j]].

The next array of "ABAB" of the next array is optimized, and the next array can be 1 0-1 1.

As in the example above, s[3] and P[3] match failed, s[3] remains the same, P's next match position is p[next[3]], and next[3]=0, so p[next[3]]=p[0] and s[3] match,

Match success

The optimization code is as follows:

voidGetNext (Char*p,int* &next) {Next= (int*)malloc(Strlen (P) *sizeof(int)); if(!next) Exit (-1); intj =0; intK =-1; next[0]  = -1;  while(J < strlen (p)-1){        if(k = =-1|| P[J] = =P[k]) {               if(P[++j] = = p[++K]) {// cannot appear p[j] = = P[next[j]], if present, need to continue recursion Next[j]=Next[k]; }Else{Next[j]=K; }            }Else{k=Next[k]; }            }}

Complete code is included:

#include <stdio.h>#include<stdlib.h>#include<string.h>intKmpmatch (Char* S,Char* p,int*next);voidGetNext (Char* p,int* &next);intMain () {int*Next; Charstrtxt[ $]; printf ("Please enter your body: \ n");    Gets (Strtxt); Charstrkey[ -]; printf ("Please enter the substring you are looking for: \ n");    Gets (strkey);        GetNext (strkey, next); printf ("Next evaluation result of substring: \ n");  for(unsignedinti =0; I <= strlen (strkey)-1; ++i) {printf ("%5c", Strkey[i]); } printf ("\ n");  for(unsignedinti =0; I <= strlen (strkey)-1; ++i) {printf ("%5d", Next[i]); } printf ("\ n"); intpos =Kmpmatch (Strtxt, strkey, next); if(POS! =-1) printf ("character match, match point is%d\n", POS); Elseprintf ("character mismatch! \ n"); System ("Pause"); return 0;}intKmpmatch (Char* S,Char* p,int*next) {    inti =0; intj =0; intSlen =strlen (s); intPlen =strlen (P);  while(I < Slen && J <Plen) {                        if(j = =-1|| S[i] = =P[j]) {i++; J++; }Else{J=Next[j]; }                }        if(j = = Plen)returnIPlen; Else return-1;}voidGetNext (Char*p,int* &next) {Next= (int*)malloc(Strlen (P) *sizeof(int)); if(!next) Exit (-1); intj =0; intK =-1; next[0]  = -1;  while(J < strlen (p)-1){                if(k = =-1|| P[J] = =P[k]) {            if(P[++j] = = p[++K]) {Next[j]=Next[k]; }Else{Next[j]=K; }            }Else{k=Next[k]; }            }        }

KMP algorithm Detailed

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

KMP algorithm Detailed

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

KMP algorithm Detailed

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support