Recently I reviewed the string matching KMP algorithm. Compared with the simple matching algorithm, the core improvement of the kmp algorithm is that the string pointer to be matched does not backtrack I, and the pattern string pointer J jumps to next [J], change to J = next [J]. the time complexity is reduced from the O (M * n) of the simple match to O (m + n). The length of the pattern string is m, and the length of the string to be matched is N.

It is hard to understand the method of the next array. Meaning of the next array: indicates the length of the same prefix and Suffix in the string before the current character. It can also be seen as the state of finite state automation, in addition, it is easier to derive some information from the perspective of automatic machines.

- "Prefix" refers to the combination of all headers of a string except the last character;
- "Suffix" indicates all Tail Combinations of a string except the first character.

The next array derives the next array and uses recursion. The algorithm construction idea is as follows:

It is known that the initial next [0] =-1, next [J] = K is used to calculate the value of next [J + 1], which is a bit like a mathematical induction.

From next [J] = K, we know P [k-1] = P [J-1], prefix [0] ~ [K-1], Suffix from [J-K] ~ [J-1], length is K. Next, we will discuss the method of [J + 1] in two cases:

If P [k] = P [J], it is clear that next [J + 1] = next [J] + 1 = k + 1;

If P [k]! = P [J], it indicates that the prefix and suffix of the match are shorter. In the prefix, find a final character P [k'] = P [J], and push the prefix forward, [0] ~ [K'] = [J-k'] ~ [J], k' + 1 is the required next [J + 1]. How should we push this qualified K', which is in turn decreased by K? Apparently not, it is a forward hop from k = next [k;

From next [J] = K, we know [0] ~ [K-1] is equal to [J-K] ~ [J-1], then the next step in [0] ~ In the [k-1], obtain the prefix Suffix of the maximum matching, which can be obtained by next [K], that is: [0] ~ [Next [k]-1] is equal to [k-next [k] ~ [K-1] With next [k] character matching.

Because [0] ~ [K-1] is equal to [J-K] ~ [J-1], so in [J-K] ~ [J-1] also has the same maximum prefix suffix match, we take [0] ~ Prefix in [k-1] [0] ~ [Next [k]-1], [J-K] ~ Suffix in [J-1] [J-next [k] ~ [J-1], they must be equal, and then further judge whether P [J] is equal to P [next [K, therefore, P [k'] = P [J] Or k = next [0] =-1 are obtained recursively.

This idea is directly converted into code:

Void getnext (char * P, int next []) {next [0] =-1; int Plen = strlen (P), J, K; For (j = 0; j <Plen-1; j ++) {k = next [J]; while (K! =-1 & P [k]! = P [J]) {k = next [k]; // find P [k'] = P [J] In the prefix, prefix = suffix, prefix = Suffix of the suffix, recurrence. } Next [J + 1] = k + 1; // if it is returned to next [0], no matching is 0; if it is returned, P [k'] = P [J], next [J + 1] = next [k'] + 1 }}

The Code logic remains unchanged. perform the following optimization:

void GetNext2(char* p, int next[]) { next[0] = -1; int j = 0, k = -1, pLen = strlen(p); while (j < pLen - 1) { if (k != -1 && p[k] != p[j]) k = next[k]; else next[++j] = ++k; // <=> {++j; ++k; next[j] = next[k]} }}

Optimize the next Array

There are still redundant jumps in the next array above. Just perform one-step judgment and optimization:

Void getnextval (char * P, int next []) {next [0] =-1; Int J = 0, K =-1, Plen = strlen (P ); while (j <Plen-1) {If (K! =-1 & P [k]! = P [J]) {k = next [k];} else {If (P [++ K]! = P [++ J]) next [J] = K; else next [J] = next [k]; // compare the previous improvement with next [J] = K to next [J] = next [K], multiple recursion times} // KMP compares s [I]! = P [J], p [J] Jump to P [next [J], and P [J] = P [next [J], so multiple recursion is performed }}

KMP matching master Algorithm

Int kmpsearch (char * s, char * P, int * Next) {int slen = strlen (s); int Plen = strlen (p); int I = 0, j = 0; while (I <slen & J <Plen) {If (j =-1 | s [I] = P [J]) {// compared with the simple match, if J =-1 is judged more, because next [0] =-1 I ++; j ++ ;} else {J = next [J]; // skip to next [J]} If (j = Plen) if (j = Plen) return I-j; else return-1 ;}

Small Lab (C implementation)

# Include <stdio. h> # include <string. h> void getnext (char * P, int next []) {next [0] =-1; Int J = 0, K =-1, Plen = strlen (P ); while (j <Plen-1) {If (K! =-1 & P [k]! = P [J]) k = next [k]; elsenext [++ J] = ++ K;} void getnext2 (char * P, int next []) {next [0] =-1; int Plen = strlen (P), J, K; For (j = 0; j <Plen-1; j ++) {k = next [J]; while (K! =-1 & P [k]! = P [J]) {k = next [k];} next [J + 1] = k + 1;} void getnextval (char * P, int next []) {next [0] =-1; Int J = 0, K =-1, Plen = strlen (p); While (j <Plen-1) {If (K! =-1 & P [k]! = P [J]) {k = next [k];} else {If (P [++ K]! = P [++ J]) next [J] = K; elsenext [J] = next [k] ;}} int kmpsearch (char * s, char * P, int * Next) {int slen = strlen (s); int Plen = strlen (p); int I = 0, j = 0; while (I <slen & J <Plen) {If (j =-1 | s [I] = P [J]) {// relative to simple match, in multiple cases, j =-1, because next [0] =-1i ++; j ++;} else {J = next [J]; // relative to simple match, without backtracking of pointer I, j jumps to next [J]} If (j = Plen) return I-j; elsereturn-1;} int main () {char * s = "BBC abcdab abcdabcdabde"; char * P = "abcdabd"; int n = strlen (p); int next [N], next2 [N], nextval [N]; int index, indexval, I; getnext (p, next); Index = kmpsearch (S, P, next); getnext2 (p, next2 ); getnextval (p, nextval); indexval = kmpsearch (S, P, nextval); for (I = 0; I <n; I ++) printf ("% d \ t", next [I]); printf ("\ n"); for (I = 0; I <n; I ++) printf ("% d \ t", next2 [I]); printf ("\ n"); for (I = 0; I <n; I ++) printf ("% d \ t", nextval [I]); printf ("\ n"); printf ("% d \ t % d", index, indexval ); return 0;}/* output:-1000012-1000012-1000-1021515 */

Reference

String Matching KMP algorithm-Ruan Yifeng

Thoroughly understand KMP-July from start to end

[Original address]: http://blog.csdn.net/thisinnocence