String pattern matching KMP algorithm

Source: Internet
Author: User

String pattern matching refers to locating a particular pattern string where it appears in a longer string.

    • A simple pattern matching algorithm

It is intuitive to write the following code to find out where the pattern string appears in a long string.

   /*simple pattern matching algorithm function: string pattern matching parameter: s: Target string P: Pattern string pos: Develop matching position return value : The match succeeds, the return pattern string in the target string in fact position matching is unsuccessful, return-1*/    intMatchConst Char* S,Const  Char* p,intPOS) {        inti =POS; intj=0 ;  while(S[i]! =' /'&& P[j]! =' /') {         if(S[i] = =P[j]) {i++ ; J++ ; }Else{i= I-j +1; J=0 ; }       }             if(P[j] = =' /')            returnIJ; Else            return-1 ; }

The above code, S is the target string, p is the pattern string, and POS specifies where to start the match p from S. The idea of implementation is also simple:

When s[i] = = P[j], the target string and the pointer of the pattern string are moved one bit backwards to match. When s[i]! = P[j], when the match is unsuccessful, the pointer of the target string and the pattern string are simultaneously backtracking, j = 0 and the pointer I of the target string goes back to the next position in the beginning of the round.

The algorithm complexity of the naïve pattern matching is O ((n-m+1) * m) n is the length of the target string, and M is the length of the pattern string.

It can be easily seen from its realization that the inefficiency of the algorithm is in the case of the pointer backtracking of the main string and the pattern string when the match is unsuccessful.

Is there an algorithm that, when the matching of the pattern string and the main string is unsuccessful, does not need to do the backtracking of the pointer, directly to the next round of matching?

    • KMP Algorithm Understanding

In the naïve string pattern matching algorithm, when the characters that encounter the main string and the pattern string do not match successfully, the pointer backtracking is performed regardless of how many characters have been matched, and then the next round of matching is started.

Such efficiency is very low. The KMP algorithm, based on the naïve pattern matching algorithm, realizes that the matching is unsuccessful, and the time complexity of the pattern matching is not traced back to the main string pointer.

Decrease to: O (n + m).

The understanding of the KMP algorithm, found a lot of information on the Internet, also read the description of the introduction of the algorithm, has been smattering. In the spare time, imagine the pattern string, the main string are looking at a straight line, carried out under the deduction, just suddenly dawned.

The core idea of the KMP algorithm is that when the s[i] and P[j] do not match, the primary string is not traced back, but instead the pointer is searched for k in the pattern string, with s[i] and P[k] for the next round of matching.

Here, the main string S and the pattern string P are considered to be a straight line, so in s[i] and P[j] match is not common, there are the following situations:

Figure 1 S[i] and P[j] match unsuccessful

That is: p[1...j-1] = = s[i-j+1,..., i-1].

P[J] and S[i] do not match, now in the pattern string p[1,..., j-1] to determine a position K (1<= K < j-1), with P[k] and S[i] for the next round of matching, then K must meet the following conditions:

P[1,.., k-1] = = s[i-k+1, ..., i-1].

The pattern string and the main string are looked at in a straight line, then there are:

Figure 2 Using P[k] and S[i] for the next round of matching

Since 1<= K < j-1, what effect will it have to merge the two graphs?

As can be seen, when S[i] and P[j] match is unsuccessful, if you can use P[k] and s[i] for the next round of matching, there are:

S[i-k+1], ..., i-1] = = p[j-k+1,..., j-1] = = p[1,..., k-1].

That is, when the s[i] and P[j] matches are unsuccessful, the most important thing is not to do pointer backtracking on the main string, but to match with P[k] and s[i], K must meet the following conditions:

P[1,..., k-1] = = p[j-k+1, ..., j-1].

    • Implementation of KMP algorithm

KMP algorithm is the improvement of matching pattern matching algorithm, when the s[i] and P[j] match is not successful, not the main string to the back of the pointer, but in p[1,..., J-1], looking for a p[k],

Use S[i] and p[k] for the next round of matching. The biggest problem of its implementation is how to find out p[k based on P[1,..., J-1].

In the implementation of the KMP algorithm, using an auxiliary array next[], using the array to save the p[j] when the match is unsuccessful, the next round of the value of k is to be matched. That is, when s[i] and P[j] matches are unsuccessful,

Use p[Next[j]] to make the next round match with s[i], k = next[j].

The solution of array next[] can be goolge to a number of methods, where the simplest recursive method is used:

First assume next[0] =–1, then when next[j] = k, there is: p[0,..., j-1] = = p[j-k+1,..., j-1].

At this point, if there is p[k] = p[j], then p[0,...., K] = p[j-k+1,.., j-1,j], thus there is next[j+1] = Next[j] + 1 = k +1.

If p[k]! = P[j], you can look at the problem of the pattern string matching itself, that is, when the match fails, the K value is determined, k = next [K].

The implementation of the array next[] is as follows:

/*    KMP for pattern matching of the auxiliary function    pattern string and the main string match is unsuccessful, the next and the main string to match the position of the pattern string */void continue_prefix_function (const char * p, int * next) { C2/>int J;    int k;    Next[0] =-1;    j = 0;    K =-1;    while (J < strlen (p)-1) {        if (k = =-1 | | p[k] = p[j]) {            j + +;            K + +;            NEXT[J] = k;        } else {            k =next[k];}}}    

  

Know that when the pattern string and the main string match is not successful, the next and main string matching characters in the pattern string position, on the basis of the naïve pattern matching is easy to write the code of the KMP algorithm is as follows:

/*using the KMP algorithm for string pattern matching when the main string and pattern string match are unsuccessful, the primary string pointer is not backtracking, for example with Next[j], to specify the position of the next and the main string to match the pattern string*/intMATCH_KMP (Const Char* S,Const Char* p,intPOS) {    intnext[ One] ; inti =POS; intj =0 ;    Continue_prefix_function (P,next);  while(S[i]! =' /'&& P[j]! =' /') {        if(S[i] = =P[j]) {i++ ; J++ ; }Else {            if(Next[j] = =-1) {i++ ; J=0 ; }            Else{J=Next[j]; }        }    }    if(P[j] = =' /')        returnIJ; Else        return-1 ;}

    • Summarize

String pattern matching KMP algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.