Order pattern matching (simple, KMP)

Source: Internet
Author: User

Pattern matching is a kind of lookup, which is divided into single pattern matching and multi-pattern matching. Lookup, is to find one or more elements in a collection, find an element called single pattern matching, find multiple elements is a multi-mode match, here only to explore single-mode matching. Although pattern matching looks different than the number lookup, it is essentially a lookup, such as finding "AABAAC" in "AABAABAABAAC", which is still in the set {Aabaab, Abaaba, Baabaa, Aabaab, for the computer, Find "AABAAAC" in Abaaba, Baabaa, AABAAC}, which is determined by the flow processing characteristics of the computer. The so-called simple pattern matching algorithm, is the sequential lookup, the code is as follows:

#include <stdio.h>#include<string.h>intSEARCHSTR (CharS[],Chart[]) {intI=1, j=1;  while(I<=strlen (S) && J<=strlen (T))//I or J jump out of bounds    {        if(s[i-1]==t[j-1]) {++i,++j;}//the same moves back        Else{i=i-j+2; j=1;}//XOR end This comparison and prepare for the next comparison    }if(J>strlen (T))returni-j+1;//determine the type of bounce, I or j out of boundsElse                return 0; }main () {Chars[]="AABAABAABAAC";Chart[]="AABAAC";inttemp=searchstr (s,t);if(temp) printf ("The position is:%d\n", temp);Elseprintf"error!\n");}

KMP algorithm does not blindly search, but did some logical reasoning skipped some unnecessary search, the following shows how KMP reasoning, we use "AABAABAABAAC" to find "AABAAC" as an example, assuming that compared to the sixth bit found mismatch, then the previous 5 bits are matched.

So we end this comparison and make the next comparison, such as:

"Aabaa?" Match failed, we skipped "Abaaa??" With "Baa???" These two times the comparison came directly to the "AA????", why this, simple reasoning to know, the template is "AABAAC", it is obvious "abaaa?" With "Baa???" is impossible to match, only "AA????" Is "likely" to match the template. In fact, this may match the string is a certain characteristic, the match fails when we get Aabaa this same prefix, Aabaa has a number of prefixes and a number of suffixes (the prefix cannot be the string itself, Aabaa is not the Aabaa prefix), the prefix is the same with a and AA, we call it a prefix, We only look for the longest prefix AA, and its length is important to us. Next go back to "AA????" This comparison:

"AA????" Compared with the template "Aabaac", the longest prefix AA does not have to be compared directly from the red part, because I is meant to be here, so I do not seem to move, only with the reposition J on the line, the new position of J is the book said next array, next[j-1] Indicates the position of J Jump after the J-bit comparison fails. In this case, the 6th-bit comparison fails, and J jumps to 3, which is the 3 stored in next[5]. How did 3 come to be inferred? is actually "the length of the longest 1+". Now suppose to be "AaB?" "Fourth bit match failed, then the same prefix is Aab,aab not the longest suffix, so its maximum prefix length is 0, according to the rule J reposition to the 1+0=1 bit, that is the first bit of the new comparison, this is why add 1 reason." When the first match fails, the next array has no effect, but we still symbolically set the next[0] to 0, which indicates that the first comparison fails, and when J is detected 0, we need to move both I and J back one, so that J is just 1, so any template next[0] is 0. The longest prefix of any character x is 0, so any template next[1] is 1.

Why KMP so difficult to understand, because the book is not clear two times the logical reasoning process, the first inference is to skip a number of comparisons to the first of a new comparison, the second reasoning is in the new comparison, and jumped over the front of a number of comparisons, the final positioning to the correct position. I did not change in this process, and J was re-positioned according to a certain rule, which is "the length of the longest j=1+". There is no doubt that the next array is inferred from the template, so it needs to be calculated beforehand, we assume that next has been solved, and the KMP code is as follows (its code is highly similar to a simple matching code):

intKMP (CharS[],CharT[],intNext[],intPOS) {inti=pos,j=1;//adding features, starting from the POS location in S     while(I<=strlen (S) && J<=strlen (T))//I or J jump out of bounds    {        if(j==0|| s[i-1]==t[j-1]) {++i,++j;}//the first comparison failure and the arbitrary bit contrast successfully processed into i,j and moved back        Else{j=next[j-1];}//Reposition J According to a certain rule    }if(J>strlen (T))returni-j+1;//determine the type of bounce, I or j out of boundsElse                return 0; }

After the KMP function is written out, do not think it is all right, away from the implementation of KMP algorithm is still far away, the essence of the whole KMP algorithm is to infer next array, how to infer next array? This time we are going to have a split thought.

Here are a few things:

The next array overall solution looks complicated, but after the split, next behaves only three, with 1, 2, and 1 more than the previous one. Write GetNext code According to the rules we infer:

voidGetNextCharT[],intnext[]) {next[0]=0; next[1]=1;//no need to beg, Heng set up     for(intI=3; I<=strlen (T); i++)    {        if(t[i-2]==t[next[i-2]-1]) next[i-1]=next[i-2]+1;//1 more than the previous one        Else if(t[i-2]==t[0]) next[i-1]=2;//Place 2        Elsenext[i-1]=1;//Place 1    }    }

The full code for KMP is as follows:

#include <stdio.h>#include<string.h>#defineMAXSIZE 100intKMP (CharS[],CharT[],intNext[],intPOS) {inti=pos,j=1;//adding features, starting from the POS location in S     while(I<=strlen (S) && J<=strlen (T))//I or J jump out of bounds    {        if(j==0|| s[i-1]==t[j-1]) {++i,++j;}//the first comparison failure and the arbitrary bit contrast successfully processed into i,j and moved back        Else{j=next[j-1];}//Reposition J According to a certain rule    }if(J>strlen (T))returni-j+1;//determine the type of bounce, I or j out of boundsElse                return 0; }voidGetNextCharT[],intnext[]) {next[0]=0; next[1]=1;//no need to beg, Heng set up     for(intI=3; I<=strlen (T); i++)    {        if(t[i-2]==t[next[i-2]-1]) next[i-1]=next[i-2]+1;//1 more than the previous one        Else if(t[i-2]==t[0]) next[i-1]=2;//Place 2        Elsenext[i-1]=1;//Place 1}}main () {Chars[]="AABAABAABAAC"; Chart[]="AABAAC"; //Calculate next Array    intNext[maxsize];    GetNext (T,next); //Show Next Arrayprintf"The next array is:");  for(intI=0; I<strlen (T); i++) printf ("%d", Next[i]); printf ("\ n"); //Show Find ResultsintTEMP=KMP (S,t,next,1);if(temp) printf ("The position is:%d\n", temp);Elseprintf"error!\n");}

Order pattern matching (simple, KMP)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.