KMP Pattern Matching

Source: Internet
Author: User

About KMP pattern matching algorithm

In processing a string, we always need to determine whether a main string s, contains the substring T, then how can we do it efficiently?

①, naïve pattern matching algorithm, the so-called simple, is not talking skills, violent enumeration, we first look at an example, such as having a main string

S= "Abcdefgggq", we need to find out whether it contains sub-strings t= "Gggq", if we use a simple pattern matching, how should we do? The answer is: enumerate each bit in S

For (I=0;i<strlen (s); i++)

Where, after enumerating each bit, go to enumerate each bit of the substring T

for (J=0;j<strlen (t); j + +)

Then, each judge can, it is not difficult to see, time complexity is O (n*m), very large.

#include <stdio.h>#include<stdlib.h>#include<string.h>intMain () {Chars[ -]; Chart[ -]; scanf ("%s%s", s,t); intlen_s=strlen (s); intlen_t=strlen (t); inti,j;  for(i=0; i<len_s; i++. )    {         for(j=0; j<len_t; j + + )        {            if(s[i+j]!=T[j]) {                 Break; }        }        if(j==len_t) {printf ("%d--%d\n", i+1, i+len_t);  Break; }    }    if(i==len_s) {printf ("No respon\n"); }    return 0;}
View Code

But it is easy to see that some comparisons can be omitted. Consider this string s= "0000000001" and t= "0001" is the last time the unsuccessful match is at the end of the T, but, really? We began to judge the top 3 (everyone is 0), matching success, then, I can not judge a little less? Of course it is possible, it depends on our KMP pattern matching.

①, KMP pattern matching: Another example: s= "Abcdefgab" and t= "Abcdel", where each element in the substring T is not the same, however, when the T loop to ' L ' when the match is not matched, then I asked, since T in the first 5 "ABCDE" and S The first 5 are equal, so is the I move down in s one useful? is, now to compare the second character in S B and T in the first character a is equal, actually do not compare I know is not equal, why, you think Ah, we already know that the characters in T are not equal, and s and T are equal to the first 5 bits, what is the explanation? The first character in T is definitely not equal to the 2--5 bit of S. So, you can save some time here. We want to judge once, I's value does not backtrack (I do not let him become 2 and then go to a comparison, direct jumping), we only move the substring T to compare. Using the above example, is to compare the "ABCDE", s in the F and T in the L range, then I directly to determine the F in S and T is equal to the "a" can be, the front of the comparison is superfluous. Here is a question, how to determine the J-value changes? I value good, is

For (I=0;i<strlen (s); i++)//Because it doesn't backtrack, that means it won't be reduced.

  

So what about J-value? If I can preprocess an array so that every time I do not succeed, I use j=next[j]; I can get the J-value. I also want to ah, how to get ah?

Case: Derivation of the next array:

Before you know the meaning of the next array, the next array is to help us reduce duplication or unnecessary comparisons. What is not necessary? is: if s= "000001" and t= "0001", the previous comparison 3 0 is the same, the fourth failed, then I let S in the fourth 0 and T in the comparison? The answer is a third 0, why? Because we compare before 0, ah, equal Ah, still use comparison? Here is a reduction in the repetition of comparisons. In the above s= "Abcdefghi" and t= "Abcdel", I value does not backtrack, is to reduce unnecessary comparisons.

After knowing a ballpark, let's look at the mathematical definition of next[]

0, if (j==1)

NEXT[J] = max (k| " A1A2A3A4....A (k-1) "= =" A (j-k+1) a (j-1) ");

1, other conditions

Note that this match is a comparison (k-1) bit, that is, if there is a string t= "Aaaal", when the j=1, compared to the empty string, so next[1]=0; when j=2, compare only a one, so belong to other situations next[2]=1; when j=3, compare strings " AA ", at this time, prefix A and suffix A are equal, please note the above formula, this time, k=2; so, next[3]=2; when j=4, notice that the comparison string is" AAA ", then, how is its prefix and suffix equal? Obviously k=3, the prefix "AA" and the suffix "AA" are matched, so next[4]=3; notice, here they are common in the middle of a value, the same next[5]=4, and Next[6], no match with him, belongs to other circumstances, so next[6]=1;

The next array at this point is the subscript starting at 1,

0

1

2

3

4

1

Here is an experience, if you get a character equal to the prefix, k=2, two characters k=3;n equal is n+1;

Next[] What is the meaning of the array? According to the above, can you see? That is, when the J-bit match is unsuccessful, return to Next[j] is the starting comparison, why? is to reduce the repetition of comparisons and useless comparisons ah. The complete code is as follows:

#include <stdio.h>#include<stdlib.h>#include<math.h>#include<string.h>voidGet_next (Char*t,int*next) {    intI=1, j=0; next[1]=0;//fixed-    intLen_t=strlen (t+1);  while(i<len_t) {        if(j==0|| t[i]==T[j]) {next[++i]=++J; }        Elsej=Next[j]; }    return ;} voidWork () {Chars[ +]={0}; Chart[ +]={0}; scanf ("%s%s", s+1, t+1); intnext[1002]={0}; Get_next (t,next);//Get Next Array    inti; intLen_s=strlen (s+1), Len_t=strlen (t+1);  for(i=1; i<=len_t; i++.) {printf ("%d", Next[i]);//first look at the value of the next array} printf ("\ n"); intJ; I=1; j=1;  while(i<=len_s&&j<=len_t) {        if(j==0|| s[i]==T[j]) {i++; J++;//Compare the next one        }        Else //Otherwise, J goes to the optimal position.{J=Next[j]; }    }    if(j==len_t+1) {printf ("%d---%d\n", i-len_t,i-1); }    Elseprintf ("no\n"); return ;}intMain () {work (); return 0;}
View Code

This article is very detailed, it is suggested that the reader to deduce the next array of values, understand the meaning of the next array, is less repetitive comparisons and useless comparisons, where the value of the next array is deduced, the next array is also used.

If the reader finds an error, please help to point out that I am grateful.

KMP Pattern matching

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.