KMP algorithm Understanding, pseudo code, C code implementation

Source: Internet
Author: User

1, the string problem formal definition: The assumption that the text is a length of n T[1..N], and the pattern is an array of length M p[1..m], where m<=n, if there is t[s+1..s+m]==p[1..m], then called the pattern P appears in T. S is a valid offset, otherwise called an invalid offset.

2, Method: First preprocessing based on the pattern, and then find all valid offsets (matching).

Pretreatment time and matching time of several methods

Algorithm

Preprocessing time

Match time

Naive algorithm

0

O ((n-m+1) *m)

Finite automata

O (m| of all finite-length strings |)

O (n)

Kmp

O (M)

O (n)

Rabin-karp

0 (M)

O ((n-m+1) *m)

3. Naïve string matching algorithm: Find all valid offsets in the loop mode S. The effective offset s may have n-m+1, each match needs m times, so the total need to match (n-m+1) *m times.

Pseudo code:

Naive-string-matcher (T,P)

1. N=t.length

2. M=p.length

3. For s=0 to N-m

4. if p[1..m] = = T[s+1..s+m]

5. printf "Pattern occurs with shift" s

Disadvantage: Ignores the text information obtained when detecting invalid s value.

4, Rabin-karp algorithm: The concept of elementary number theory. Do not study for the moment.

5, the use of finite automata for string matching: first set up good one finite automaton, and then according to the finite automaton matching.

Finite automata: Includes five elements, a set of all States, an initial state, a collection of receiving States, a limited input alphabet, and a transfer function.

6, KMP algorithm: Through the prefix function to avoid the detection of useless offsets. It is also possible to avoid the calculation of the entire transfer function in automata matching. The main reason is that there is a partial matching phenomenon in the string.

Nature: The needle treats the characteristic of the matching pattern string, determines if it has duplicate characters, finds its prefix and suffix, and then the corresponding next array, and finally KMP matches according to the next array .

Next array: Records the position of the common elements in the string p, that is, how much cheaper the first common element can get to the second same element.

The essence of "partial match" is that sometimes the string header and tail are duplicated. For example, "Abcdab" has two "AB", then its "partial match value" is 2 ("ab" length). When the search term moves, the first "AB" Moves backwards 4 bits (the length of the string-part of the match), and it can come to the second "ab" position.

General idea:

Comparison function for KMP:

1. First initialize the next array, next[0]=0,next[1]=1

2. Whether the Loop Lookup mode p is in t

1) First compare p[i] = = T[j], if equal, continue to compare next, otherwise perform 2.2)

2) make J=next[j], continue to compare (this step avoids backtracking)

3) if j==0; Indicates no match, then i++, j + +

3. Until the position of P in T is found or T has been compared to the late end.

In the event of a mismatch,Jthe new valueNext[j]depending on the pattern stringt[0 ~ j-1]The length of the equal part of the prefix and suffix, andNext[j]exactly equal to this maximum length

Initialization of Next array

1. Define next array, make next[0]=0, next[1]=1

2. Start loop to calculate the corresponding next array from p[2]

3. cyclic calculation of the value of Next[j]

4. Find a p[i]=p[next[i] [from next[j]], if equal next[j]=next[i]+1

5. Otherwise, make I=next[i] continue looking forward until equal is found.

6. If i=0, indicates that there is no p[j] same prefix in mode p, so that next[j]=1

7, pseudo-code:

Initialize next array

Next-func (P, Next)

1.let NEXT[1..M] = 1

2.next[1]=0, Next[2]=1

3.for i=3 to M

4. j = i

5. While J! = 0

5. if p[j] = = P[next[j]]

6. next[i]=next[j]+1

7. Break;

8. Else

9. J = Next[j]

If j = = 0 Then next[j] = 1

KMP-CMP (T, P)

1.next-func (P, NEXT), j = 0;

2.for T[i] form t[0] to T[n]

3. If t[i] = = S[j]

4. If j = = M then Retrun true

5. Then i++ J + +

6. Else J = Next[j];

7. If j = = 0 Then i++ j + +

#include <stdio.h>
int Nextarr (char* p, int* next, int m) {
NEXT[1] = 0;
NEXT[2] = 1;
int I, J;
for (i=3; i<m; i++) {
J=i;
while (j!=0) {
if (p[j] = = P[next[j]]) {
Next[i] = = next[j]+1;
Break
}else{
J=NEXT[J];
}
}
if (j==0) {
NEXT[J] = 1;
}
}
return 0;
}
int kmpcmp (char* t, char* p, int n, int m) {
int i = 0,j = 0;
int next[10];
Next[0] = 0;
Nextarr (P, next, M);
for (i=0; i<m; i++) {
if (T[i]==p[j]) {
if (j==m-1) {
printf ("%d\n", i-m+2);
return 1;
}else{
j + +;
Continue
}
}else{
J=NEXT[J];
}
if (j==0) {
J=1;
}
}
return 0;
}


KMP algorithm Understanding, pseudo code, C code implementation

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.