Detailed explanation of the next array of the kmp algorithm in algorithm learning notes

Last Update:2014-08-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently I reviewed the string matching KMP algorithm. Compared with the simple matching algorithm, the core improvement of the kmp algorithm is that the string pointer to be matched does not backtrack I, and the pattern string pointer J jumps to next [J], change to J = next [J]. the time complexity is reduced from the O (M * n) of the simple match to O (m + n). The length of the pattern string is m, and the length of the string to be matched is N.

It is hard to understand the method of the next array. Meaning of the next array: indicates the length of the same prefix and Suffix in the string before the current character. It can also be seen as the state of finite state automation, in addition, it is easier to derive some information from the perspective of automatic machines.

"Prefix" refers to the combination of all headers of a string except the last character;
"Suffix" indicates all Tail Combinations of a string except the first character.

The next array derives the next array and uses recursion. The algorithm construction idea is as follows:
It is known that the initial next [0] =-1, next [J] = K is used to calculate the value of next [J + 1], which is a bit like a mathematical induction.
From next [J] = K, we know P [k-1] = P [J-1], prefix [0] ~ [K-1], Suffix from [J-K] ~ [J-1], length is K. Next, we will discuss the method of [J + 1] in two cases:

If P [k] = P [J], it is clear that next [J + 1] = next [J] + 1 = k + 1;

If P [k]! = P [J], it indicates that the prefix and suffix of the match are shorter. In the prefix, find a final character P [k'] = P [J], and push the prefix forward, [0] ~ [K'] = [J-k'] ~ [J], k' + 1 is the required next [J + 1]. How should we push this qualified K', which is in turn decreased by K? Apparently not, it is a forward hop from k = next [k;

From next [J] = K, we know [0] ~ [K-1] is equal to [J-K] ~ [J-1], then the next step in [0] ~ In the [k-1], obtain the prefix Suffix of the maximum matching, which can be obtained by next [K], that is: [0] ~ [Next [k]-1] is equal to [k-next [k] ~ [K-1] With next [k] character matching.

Because [0] ~ [K-1] is equal to [J-K] ~ [J-1], so in [J-K] ~ [J-1] also has the same maximum prefix suffix match, we take [0] ~ Prefix in [k-1] [0] ~ [Next [k]-1], [J-K] ~ Suffix in [J-1] [J-next [k] ~ [J-1], they must be equal, and then further judge whether P [J] is equal to P [next [K, therefore, P [k'] = P [J] Or k = next [0] =-1 are obtained recursively.

This idea is directly converted into code:

Void getnext (char * P, int next []) {next [0] =-1; int Plen = strlen (P), J, K; For (j = 0; j <Plen-1; j ++) {k = next [J]; while (K! =-1 & P [k]! = P [J]) {k = next [k]; // find P [k'] = P [J] In the prefix, prefix = suffix, prefix = Suffix of the suffix, recurrence. } Next [J + 1] = k + 1; // if it is returned to next [0], no matching is 0; if it is returned, P [k'] = P [J], next [J + 1] = next [k'] + 1 }}

The Code logic remains unchanged. perform the following optimization:

void GetNext2(char* p, int next[]) {     next[0] = -1;    int j = 0, k = -1, pLen = strlen(p);    while (j < pLen - 1) {        if (k != -1 && p[k] != p[j])            k = next[k];        else            next[++j] = ++k;     // <=> {++j; ++k; next[j] = next[k]}    }}

Optimize the next Array

There are still redundant jumps in the next array above. Just perform one-step judgment and optimization:

Void getnextval (char * P, int next []) {next [0] =-1; Int J = 0, K =-1, Plen = strlen (P ); while (j <Plen-1) {If (K! =-1 & P [k]! = P [J]) {k = next [k];} else {If (P [++ K]! = P [++ J]) next [J] = K; else next [J] = next [k]; // compare the previous improvement with next [J] = K to next [J] = next [K], multiple recursion times} // KMP compares s [I]! = P [J], p [J] Jump to P [next [J], and P [J] = P [next [J], so multiple recursion is performed }}

KMP matching master Algorithm

Int kmpsearch (char * s, char * P, int * Next) {int slen = strlen (s); int Plen = strlen (p); int I = 0, j = 0; while (I <slen & J <Plen) {If (j =-1 | s [I] = P [J]) {// compared with the simple match, if J =-1 is judged more, because next [0] =-1 I ++; j ++ ;} else {J = next [J]; // skip to next [J]} If (j = Plen) if (j = Plen) return I-j; else return-1 ;}

Small Lab (C implementation)

# Include <stdio. h> # include <string. h> void getnext (char * P, int next []) {next [0] =-1; Int J = 0, K =-1, Plen = strlen (P ); while (j <Plen-1) {If (K! =-1 & P [k]! = P [J]) k = next [k]; elsenext [++ J] = ++ K;} void getnext2 (char * P, int next []) {next [0] =-1; int Plen = strlen (P), J, K; For (j = 0; j <Plen-1; j ++) {k = next [J]; while (K! =-1 & P [k]! = P [J]) {k = next [k];} next [J + 1] = k + 1;} void getnextval (char * P, int next []) {next [0] =-1; Int J = 0, K =-1, Plen = strlen (p); While (j <Plen-1) {If (K! =-1 & P [k]! = P [J]) {k = next [k];} else {If (P [++ K]! = P [++ J]) next [J] = K; elsenext [J] = next [k] ;}} int kmpsearch (char * s, char * P, int * Next) {int slen = strlen (s); int Plen = strlen (p); int I = 0, j = 0; while (I <slen & J <Plen) {If (j =-1 | s [I] = P [J]) {// relative to simple match, in multiple cases, j =-1, because next [0] =-1i ++; j ++;} else {J = next [J]; // relative to simple match, without backtracking of pointer I, j jumps to next [J]} If (j = Plen) return I-j; elsereturn-1;} int main () {char * s = "BBC abcdab abcdabcdabde"; char * P = "abcdabd"; int n = strlen (p); int next [N], next2 [N], nextval [N]; int index, indexval, I; getnext (p, next); Index = kmpsearch (S, P, next); getnext2 (p, next2 ); getnextval (p, nextval); indexval = kmpsearch (S, P, nextval); for (I = 0; I <n; I ++) printf ("% d \ t", next [I]); printf ("\ n"); for (I = 0; I <n; I ++) printf ("% d \ t", next2 [I]); printf ("\ n"); for (I = 0; I <n; I ++) printf ("% d \ t", nextval [I]); printf ("\ n"); printf ("% d \ t % d", index, indexval ); return 0;}/* output:-1000012-1000012-1000-1021515 */

Reference

String Matching KMP algorithm-Ruan Yifeng

Thoroughly understand KMP-July from start to end

[Original address]: http://blog.csdn.net/thisinnocence

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Detailed explanation of the next array of the kmp algorithm in algorithm learning notes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Detailed explanation of the next array of the kmp algorithm in algorithm learning notes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support