KMP Algorithm Concise Tutorial

Source: Internet
Author: User

Look for the position of the pattern string p in the string s, which is a string match problem.

To illustrate:

i = 0   1   2   3   4   5   6  7   8  9   b   A   a   c   a B A a a   b   a   a   = a   b   a   a   = 0   1   2   3   4

Before the invention of the KMP algorithm. People use this algorithm:

" "primitive substring position algorithm, O (M * N)" "defString_index_of (PSTR, pattern, pos =0): Str_index=POS Pattern_index=0 Pattern_len=len (pattern) whileStr_index < Len (PSTR) andPattern_index <Pattern_len:ifpstr[Str_index] = =pattern[Pattern_index]: Str_index+ = 1Pattern_index+ = 1Else: Str_index= Str_index-pattern_index + 1Pattern_index=0ifPattern_index = =Pattern_len:returnStr_index-Pattern_indexreturn-1pstr='I am Caochao, I love coding!'pattern='ao'Print(String_index_of (pstr, pattern, 7 ) )Print(Pstr.find (pattern))

When S[4] is mismatched with P[4], the main string s goes back to I=1, the pattern string is traced back to j=0, and then it continues to match from this position. Obviously this is inefficient, assuming that s length is n,p length m, then its time complexity is O (m*n).

Now consider such a problem, when s and p match to position i,j, s[j] is not equal to p[j], if I do not change at this time, the P string from the K (0<K<J) continue to match, so do not need to backtrack I approach this is the KMP algorithm we want to talk about. So how do you get this k value? The position K (Next[j]) to jump to at J of each mismatch in P can be calculated beforehand, which is the next array in the KMP algorithm.

The KMP algorithm steps are as follows:

1, initialize i,j are 0,

2, then compare S[i] and p[j], if equal i,j each plus 1, otherwise maintain I unchanged, j=k (Next[j]). If the J-value is -1,i,j at some point, add 1 and then continue to match.

3, repeat step 2

According to the above analysis, the key point of KMP algorithm is the method of K value. When a match is made to s[i] not equal to P[J], assuming that s[i] and p[k] continue to be compared (0<K<J), then the first k characters in P must be met, and the K ' >k cannot exist to satisfy equation 1:

Equation 1,p[0,k-1]=s[i-k,i-1]

And the word character existed before i,j equation 2:

Equation 2,p[j-k,j-1]=s[i-k,i-1]

Thus, Equation 3 can be introduced:

Equation 3,p[0,k-1]=p[j-k,j-1]

At this point, the K-value is already very obvious, that is, in the P-string to take the largest K (0<k<j), so that the beginning of the K characters in P and P[j] before the K character typeface and so on. In this way, when s[i] is not equal to p[j], it is possible to continue matching at a distance of p[0] far away, thus improving the matching efficiency.

The definition of K, next[j] can be given from the above analysis:

1,j=0, Next[j]=-1

2,NEXT[J] = Max{k|0<k<j and P[0,k-1]=p[j-k,j-1]}

3, other cases, next[j]=1

To recursively deduce the next array:

Starting from the definition of next[j], a recursive approach can be used to obtain NEXT[J]:

First, Next[0]=-1

Next[j]=k (0<K<J), the presence of K in P and the absence of K ' >k satisfies the following relationship:

P[0,K-1]=P[J-K,J-1]

Then the value of next[j+1] is 3 cases,

1, if P[K]=P[J], then the presence of K in P, and does not exist K ' >k satisfy the relationship p[0,k]=p[j-k,j], then next[j+1]=k+1, that is

Next[j+1]=next[j]+1

2, if P[K] is not equal to P[J], the process of seeking next function can be regarded as the process of pattern matching, that is, p is both the main string and the pattern string. And in the pattern match the species, at this time should let P[j] and P[next[k]] "continue to compare.

For the sake of understanding, this makes Next[k]=k '.

If P[j]=p[k '], next[j+1]=k ' +1, or next[j+1]=next[k]+1, is also

Next[j+1]=next[next[j]]+1

Similarly if P[J] is not equal to P[k '], then continue to let P[j] and Next[k '] comparison, and so on, and so on, until K ' =-1:

Next[j+1]=0

The code is implemented as follows:

" "KMP Next[j] Array" "defKmp_get_next (pattern): I=0 J=-1_next= [0] *len (pattern) _next[0]=-1 whileI < Len (pattern)-1:        ifj = =-1orpattern[I] = =pattern[J]: I+ = 1J+ = 1_next[i]=JElse: J=_next[J]return_next

Optimization of the next array method:

Consider the following pattern string:

J       =   0    1    2    3    4       p = a A A =    -1        0    1    2    3

If a moment s[i] and p[3] is not equal, according to NEXT[3] The instruction should let S[i] and p[2] continue to compare, because p[3] and p[2] equal, this step is obviously superfluous. Generalize to the general situation, in the process of seeking next array, if Next[i]=j and P[i]=p[j], then make Next[i]=next[j]. The code is as follows:

" "KMP Next[j] Array" "defKmp_get_next (pattern): I=0 J=-1_next= [0] *len (pattern) _next[0]=-1 whileI < Len (pattern)-1:        ifj = =-1orpattern[I] = =pattern[J]: I+ = 1J+ = 1ifpattern[I] = =pattern[J]: _next[i]=_next[J]Else: _next[i]=JElse: J=_next[J]return_next

Once the Next[j] array is pre-evaluated, the KMP algorithm code is implemented as follows:

" "KMP to find the substring position" "defKmp_index_of (PSTR, pattern, pos =0): _next=Kmp_get_next (pattern) Str_index=POS Pattern_index=0 Pattern_len=len (pattern) whileStr_index < Len (PSTR) andPattern_index <Pattern_len:ifPattern_index = =-1orpstr[Str_index] = =pattern[Pattern_index]: Str_index+ = 1Pattern_index+ = 1Else: Pattern_index=_next[Pattern_index]ifPattern_index = =Pattern_len:returnStr_index-Pattern_indexreturn-1pstr='I am Caochao, I love coding!'pattern='ao'Print(Kmp_index_of (pstr, pattern, 7 ) )Print(Pstr.find (pattern))

Finally, compared with the common algorithm and the KMP algorithm, this paper solves the problem matching process:

Common algorithm:

i = 0   1   2   3   4   5   6   7   8   9  ten  each   s = A   b   A   a   c   A B A a a   b   a   a   b p = A   b   A   a   b j = 0   1   2   3   4  

S[4] Not equal to p[4], so i=1,j=0

i = 0   1   2   3   4   5   6   7   8   9  ten  each  s = a   b   a   a   c   a   b a a   a   b   a   b p =     a   b   a   a   b j =     0   1   2   3   4

S[1] Not equal to p[0], so i=2,j=0

i = 0   1   2   3   4   5   6   7   8   9  ten  each  s = a   b   a   a   C   A B A a a   b   a   a   b p =         a   b   a   a   B j =         0   1   2   3   4

S[3] Not equal to p[1], so i=3,j=0

i = 0   1   2   3   4   5   6   7   8   9  ten  each  s = a   b   a   a   c   a   b a a   a   b   a   b p =             A   b   a   a   b j =             0   1   2   3   4

Limited to space, skip middle n steps, jump to i=9,j=0

i = 0   1   2   3   4   5   6   7   8   9  ten  each  s = a   b   a   a   C   A b A A a   b   a   a   b p =                                     a   b   a   a   b j =                                     0   1   2)   3   4

All the way i++,j++ until i=14, jumping out of the loop to match and returning 9.

KMP algorithm:

The P string next array is:

J       =   0    1    2    3    4p       =   a    b    a    a    bnext[j] =  -1    0    0)    1    1 

Next array optimization becomes:

J       =   0    1    2    3    4p       =   a    b    a    bnext[j] =  1    0   -1    1    0

The following starts the match:

i = 0   1   2   3   4   5   6   7   8   9  ten  each  s = a   b   a   a   C   A b A A a   b   a   a   b p = a   b   a   a   b j = 0   1   2   3   4

S[4] Not equal to p[4], so j=0

i = 0   1   2   3   4   5   6   7   8   9  ten  each  s = a   b   a   a   c   a   b a a   a   b   a   b p =                 A    b   a   a   b j =                 0   1   2   3   4

S[4] is not equal to p[0],next[0]=-1, so i,j each add 1. I=5,j=0

i = 0   1   2   3   4   5   6   7   8   9  ten  each  s = a   b   a   a   c   a   b a a   a   b   a   b p =                     A   b   A   a   b j =                     0   1   2   3   4

i++,j++, until s[9] is not equal to p[4], so that j=0

i = 0   1   2   3   4   5   6   7   8   9  ten  each  s = a   b   a   a   c   a   b a a   a   b   a   a   b p =                                     a   b   a   a   B j =                                     0   1   2   3   4

All the way i++,j++ until i=14, jumping out of the loop to match and returning 9.

By comparison, it can be seen that the KMP algorithm is much faster than the ordinary algorithm, as long as the pattern string next array is pre-determined, the entire matching process I do not need to backtrack, time complexity is also improved by the General algorithm O (m*n) to O (m+n).

KMP Algorithm Concise Tutorial

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.