Pattern matching of strings--BF algorithm and KMP algorithm _c language

Source: Internet
Author: User
Tags strlen

A. BF algorithm
BF algorithm is a common pattern matching algorithm, the idea of the BF algorithm is to match the first character of the target string s with the first character of the pattern string p, if equal, the second character of S and the second character of P are continued, and if not equal, the second character of S and the first character of P are compared. The results are compared until the final match is reached.

An example is provided:

  S:ababcababa P:ababa BF Algorithm matching steps are as follows I=0 I=1 i=2 i=3 i=4             First trip: Ababcababa second trip: Ababcababa Third: Ababcababa Fourth: Ababcababa fifth trip: Ababcababa Ababa Ababa             Ababa Ababa Ababa j=0 J=1 j=2 J=4 (I and J backtracking) I=1 i=2 i=3 i=4 i=3 Sixth trip: Ababcababa Seventh trip: ABABCA Baba Eighth: Ababcababa Nineth Trip: Ababcababa Tenth: Ababcababa Ababa Ababa Ababa ABA                  BA Ababa j=0 j=0 j=1 j=2 (i and J backtracking) J=0 i=4 I=5 i=6 i=7 i=8 11th: Ababcababa 12th trip: Ababcababa 13th trip: Ababcababa 14th trip: Abab
        Cababa 15th trip: Ababcababa Ababa Ababa Ababa Ababa Ababa                  J=0J=0 j=1 j=2 j=3 i=9 16th trip: Ababcababa Ababa j=4 (matching success)

Code implementation:

int Bfmatch (char *s,char *p)
{
  int i,j;
  i=0;
  while (I<strlen (s))
  {
    j=0;
    while (S[i]==p[j]&&j<strlen (p))
    {
      i++;
      j + +;
    }
    if (J==strlen (p)) return
      I-strlen (p);
    i=i-j+1;        Pointer i backtracking
  }
  return-1;  
}

In fact, in the above matching process, there are a lot of comparisons are superfluous. On the sixth trip, I can stay the same, with a J value of 2, when the fifth trip fails. Because in the previous matching process, for string s, known s0s1s2s3=p0p1p2p3, and because of p0!=p1!, the sixth trip is redundant. Also because of the P0==P2,P1==P3, so the seventh trip and the eighth trip match is redundant. These redundant matches are omitted from the KMP algorithm.

Two. KMP algorithm

The

    KMP algorithm is called the KMP algorithm because the algorithm is proposed by three people and takes the first letter of three names as the name of the algorithm. In fact, the difference between the KMP algorithm and the BF algorithm is that the KMP algorithm cleverly eliminates the backtracking problem of pointer I, just to determine the position of the next match J, so that the complexity of the problem from O (MN) down to O (m+n).
in the KMP algorithm, in order to determine the position of J at the next match when the match is unsuccessful, the next[] array is introduced, and the value of Next[j] indicates that the length of the longest suffix in p[0...j-1 is equal to the prefix of the same character sequence. The
for next[] arrays is defined as follows:
1 Next[j] = -1  j = 0
2) next[j] = max (k): 0<k<j   p[0...k-1]=p[j- K,J-1]
3) next[j] = 0  other
such as:
p      a    b      b   a
j      0    1   2     3   4
 next    -1   0   0    1    2
means next[j]=k>0 when p[0...k-1]=p[j-k,j-1]
so the idea of the KMP algorithm is that in the matching process, if there is a mismatch, if the next[j]>= 0, the target string of the pointer I invariant, the pattern string of pointer J to move to Next[j] position to continue to match, if next[j]=-1, then move I 1 digits to the right, and the J to 0, continue to compare. The
code is implemented as follows:

int Kmpmatch (char *s,char *p)
{
  int next[100];
  int i,j;
  i=0;
  j=0;
  GetNext (p,next);
  while (I<strlen (s))
  {
    if (j==-1| | S[I]==P[J])
    {
      i++;
      j + +;
    }
    else
    {
      j=next[j];    Eliminates the backtracking of pointer i
    }
    if (J==strlen (p)) return
      I-strlen (p);
  }
  return-1;
}

So the key of KMP algorithm is to calculate the value of next[] array, that is, the longest suffix at each position of the model string is the same length as the prefix, and the value of the next[] array has two kinds of ideas, the first is to use the idea of recursion to calculate, there is a direct solution.
1. According to the thought of recursion:
According to the definition next[0]=-1, suppose next[j]=k, i.e. p[0...k-1]==p[j-k,j-1]
1) If P[J]==P[K], then there are p[0..k]==p[j-k,j], obviously, next[j+1]=next[j]+1=k+1;
2) If P[J]!=P[K], you can think of it as a pattern matching problem, that is, when the match failed, K value how to move, obviously k=next[k].
So it's possible to do this:

void GetNext (char *p,int *next)
{
  int j,k;
  Next[0]=-1;
  j=0;
  K=-1;
  while (J<strlen (p)-1)
  {
    if (k==-1| | P[J]==P[K])  ///matching case, P[j]==p[k]
    {
      j + +;
      k++;
      next[j]=k;
    }
    else          //p[j]!=p[k]
      k=next[k];
  }


2. Direct Solution method

void GetNext (char *p,int *next)
{
  int i,j,temp;
  For (I=0;i<strlen (p); i++)
  {
    if (i==0)
    {
      next[i]=-1;   Next[0]=-1
    }
    else if (i==1) 
    {
      next[i]=0;   Next[1]=0
    }
    else
    {
      temp=i-1;
      for (j=temp;j>0;j--)
      {
        if (equals (p,i,j))
        {
          next[i]=j;  Find the maximum k value break
          ;
        }
      if (j==0)
        next[i]=0
}}} BOOL Equals (char *p,int i,int j)   //judgment p[0...j-1] and p[i-j...i-1] are equal 
{
  int k=0;
  int s=i-j;
  for (; k<=j-1&&s<=i-1;k++,s++)
  {
    if (P[k]!=p[s]) return
      false;
  }
  return true;
}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.