[Learning Notes] KMP matching algorithm and next derivation process

Source: Internet
Author: User
Tags strlen

Article Starter: http://pjf.name/post-122.html

This article is based on "name identification-non-commercial-share 4.0 International" agreement Creation or reprint, reproduced original article please indicate the private plots from the madman , otherwise please do not reprint or reprint, thank you cooperation:)

First of all, D.e.knuth,j.h.morris and V.r.pratt, the three old-timers salute, invented this efficient algorithm.

Look at this algorithm. In our naïve matching pattern algorithm we find that we are not working hard, such as we look for string B ' Abd ' in string a "Abcdabdac". Logically, we compare strings A and b from start to finish.


When we find that the character ' d ' in string B is not the same as the character ' C ' of the string, we align the string B with the character ' B ' of string A, and then compare it again:


, which has been shifted so far to the end, to find a matching string or no matching string:

This algorithm can be time-consuming when you don't want to be a long length of string.

By looking at it, we would, if we were to match it, we would just move the string B here when we first matched it, and the string A would not go back.

And this is the idea of the KMP matching algorithm. Just how does the computer know where to go back? Then we'll tinker with a retrospective, so the KMP algorithm has a next value that records where it goes, and this is the essence of the KMP matching algorithm.

Google for a long time, found a lot of blogs and books including the Big Talk, next evaluation follows such a principle:


When J=0, Next[j]=-1 max{k|1<k<j, and ' P1....pk-1 ' ==pj-k+1...pj-1 '} When this collection is not empty, next[j]=0

I do not know whether others see it or not, anyway I see the second rule is ignorant, what thing it is ... Of course, then << big talk >> The above example is also very clear, I also understand the principle 2 is a what thing, found that the person with the largest sub-sequence summed almost, of course, there is a difference, in fact, this is the meaning of the string B in the length of the existence of a subscript k is B (1)-C ( The value of k-1) is the same as the value of B (k+1) to B (y-1). Then the value of Next[j] is K, and then after the best optimization there is a super classic code that I've been thinking about for a long time, that

void GetNext (int stringb[],int *next)
{
    int mainstringposition,nextback;

    Nextback=-1;
    Stringlength=strlen (STRINGB);

    while (mainstringposition<stringlength-1)
    {
        if (nextback==-1| | Stringb[nextback]==stringback[mainstringposition])
        {
            nextback++;
            mainstringposition++;
            Next[mainstringposition]=nextback;
        }
        else
            nextback=next[nextback];
    }
}

Of course, you can not understand this code, this is after the optimization of the N predecessors of the code. Google later saw some blog said: This ah, is a derivation process, what next[k+1]=next[k]+1, but all is a stroke, let people more confused. Okay, See this blog on next the derivation process, suddenly suddenly enlightened (link: http://www.cnblogs.com/yjiyjige/p/3263858.html#commentform). The principle I already, actually is next[j] the value is K, And this k satisfies such a condition that the value of stringb[0]-stringb[k-1] is the same as the string of stringb[k+1]-stringb[length-1] (except for the case of J=0)

Here's the process for the next derivation of this blog post:

now always keep in mind that the value of next[j] (that is, K) indicates that when p[j]! = T[i], the next move position of the J pointer.


Let's look at the first one: When J is 0 o'clock, if it doesn't match, what to do.




in this case, J is on the far left and cannot be moved again, it should be the I pointer to move back. So in the code there will be next[0] =-1; this initialization.


What if it was when J was 1.


Obviously, the J-pointer must be moved back to the 0-bit position. Because it is the only place in front of it ~ ~ ~


The following is the most important, see the following figure:


Please compare these two figures carefully.

we found a pattern:

when p[k] = = P[j],

have next[j+1] = = Next[j] + 1

In fact, this can be proved:

because before P[j] there is already p[0 ~ k-1] = = P[j-k ~ J-1]. (Next[j] = = k)

at this time existing p[k] = = P[j], we can not get p[0 ~ k-1] + p[k] = = P[j-k ~ J-1] + p[j].

i.e.: p[0 ~ K] = = P[j-k ~ j], i.e. next[j+1] = = k + 1 = = Next[j] + 1.

The formula here is not very understood, or it is easy to understand the picture.


What if p[k]! = p[j]? As shown in the following illustration:

in this case, if you look at the code it should be this sentence: K = next[k]; Why is it like this? You see, it should be understood below.


Now you should know why K = Next[k]. Like the example above, it is impossible to find the longest suffix string [a,b,a,b], but we can still find the prefix string [a, b], [B.]. So this process does not look like the [A,b,a,c] string, when C is different from the main string (that is, the K position is different), it is of course to move the pointer to Next[k].

Look at the above deduction process must be enlightened. Hey, me too, thanks to the blogger here.

Know next[] derivation process, write KMP algorithm is not difficult not, hey ...

bool kmpstring (int mainstring[],int patternstring[],int startfindposition,int *FindPosition)//
    The mainstring represents the main string, patternstring represents the string, startfindposition represents the position where the match was started, and the findposition represents the matching position {int patternnext==-1 of the string found;

    int *next; if (!
    next= (int*) malloc (sizeof (int) *strlen (patternstring))) return flase;
    GetNext (Patternstring,next); if (Position>strlen (mainstringp[])) return false;//returns false while the position of the start match is not within the range of the main string (Position<strlen (Mai Nstring[]) {if (mainstring[postion]==patternstring[patternnext]| |
             patternnext==1-)//current string match, move back one continuation match {patternnext++;
        mainstring++; } else Patternnext=next (patternnext);//The current string does not match, and string is traced back to the corresponding position for the next alignment} if (Patternnext>strlen (Pa
    tternstring)//If the same string Findposition=position-strlen (patternstring) is found;
    else//returned 1 means no value of the same findposition=-1;
return true; }
But we also found a KMP the weakness of this fellow, such as the main string for Aaaabcdef, string for Aaaaax, then follow the KMP match is the following image:

My notes are also written, the 2,3,4,5 step is superfluous, so we have to transform this fellow, that is, the following code:

void Newgetnext (int stringb[],int *next)
{
    int mainstringposition,nextback;

    Nextback=-1;
    Stringlength=strlen (STRINGB);

    while (mainstringposition<stringlength-1)
    {
        if (nextback==-1| | Stringb[nextback]==stringback[mainstringposition])
        {
            nextback++;
            mainstringposition++;
            if (stringb[nextback]==stringb[mainstringposition]//this judgment is new, judging whether the current value is the same as the previous one, and if it is the same, then the backtracking here is the same as the previous position, That is, the above example directly skipped the 2,3,4,5 step matching step
                next[mainstringposition]=next (nextback);
            else
                next[mainstringposition]=nextback;
                
        }
        else
            nextback=next[nextback];
    }
}
OK, the KMP matching algorithm is done, in fact, Tuesday, but then the next BM algorithm, feel more good, hey, will write this BM algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.