KMP and extended KMP

Source: Internet
Author: User

KMP: two strings, A (called template string) and B (called substrings), are given. The lengths are Lena and lenb, respectively, and must be in linear time, for each a [I] (0 <= I <Lena), find the maximum length of the matched prefix of a [I] Forward and B, as ex [I] (or, ex [I] to meet a [I-Z + 1 .. i] = B [0 .. the maximum Z value of the Z-1 ). The main purpose of KMP is to determine whether B is a sub-string of A, and if yes, all the positions where B appears in A (when ex [I] = lenb ).

[Algorithm]
Set next [I] To Meet B [I-Z + 1 .. i] = B [0 .. the maximum Z value of Z-1] (that is, B's own matching ). Set the current next [0 .. lenB-1] And Ex [0 .. I-1] have been obtained, use them to evaluate the value of ex [I.
According to the definition of ex, there is a [i-1-ex [I-1] + 1 .. i-1] = B [0 .. ex [I-1]-1], if there is a [I] = B [ex [I-1], then you can directly get ex [I] = ex [I-1] + 1 (because i-1-ex [I-1] + 1 I .e. I-Ex [I-1], now because a [I] = B [ex [I-1], get a [I-Ex [I-1] .. i] = B [0 .. ex [I-1], I .e. a [I-Ex [I-1] + 1-1 .. i] = B [0 .. ex [I-1] + 1-1], so ex [I] = ex [I-1] + 1 ). If a [I]! = B [ex [I-1]?
Set J = next [ex [I-1]-1], then according to next defined B [ex [I-1]-J .. ex [I-1]-1] = B [0 .. j-1], and because a [I-Ex [I-1] .. i-1] = B [0 .. ex [I-1]-1] Get a [I-j .. i-1] = B [ex [I-1]-J .. ex [I-1]-1], so there is a [I-j .. i-1] = B [0 .. j-1]! That is, you only need to compare whether the values of a [I] and B [J] are equal. If they are equal, ex [I] = J + 1 can be obtained. If they are still not equal, update J to next [J-1] and continue to compare whether a [I] is equal to B [J ...... Until a [I] is equal to B [J] Or until J = 0, a [I] is not equal to B [J]. In this case, ex [I] = 0. Boundary: When ex [0] is obtained, the initial J (used to replace ex [I-1]) is 0.
There is another question: how can we find next? Obviously, next is the template String Based on B itself, and B is the "self-matching" of the sub-string. The only difference is that next [0] = lenb can be directly obtained, when next [1] is evaluated, the initial J (instead of next [I-1]) is 0.

Core code:

lenA = strlen(A); lenB = strlen(B);    next[0] = lenB;    int j = 0;    re2(i, 1, lenB) {        while (j && B[i] != B[j]) j = next[j - 1];        if (B[i] == B[j]) j++;        next[i] = j;    }    j = 0;    re(i, lenA) {        while (j && A[i] != B[j]) j = next[j - 1];        if (A[i] == B[j]) j++;        ex[i] = j;    }

Extended KMP: Provides the template string a and the sub-string B with the length of Lena and lenb, respectively. In linear time, for each a [I] (0 <= I <Lena ), find a [I .. the longest common prefix length of lenA-1] and B, recorded as ex [I] (or, ex [I] to meet a [I .. I + Z-1] = B [0 .. the maximum Z value of the Z-1 ). The extended KMP can be used to solve many string problems, such as finding the longest response substring and the longest repetition substring of a string.
[Algorithm]
Set next [I] to the maximum Z value that satisfies B [I. I + Z-1] = B [0 .. Z-1] (that is, B's own match ). Set the current next [0 .. lenB-1] And Ex [0 .. I-1] have been obtained, use them to evaluate the value of ex [I.
Set P to the farthest position matched in string a, and K to the value (or, K is the maximum value of I0 + ex [I0]-1 in all I0 values of 0 <= I0 <I, and P is the maximum value, K + ex [k]-1). Obviously, all bits after P are unknown, that is, a [p + 1 .. whether any one of the lenA-1] is equal to any of B.
According to the definition of ex, a [k .. p] = B [0 .. p-K], because I> K, so there is a [I .. p] = B [I-k .. p-K]. If l = next [I-K] is set, B [0 .. l-1] = B [I-k .. i-K + L-1]. Consider the relationship between I-K + L-1 and p-K:
(1) I-K + L-1 <p-K, that is, I + L <= P. In this case .. p] = B [I-k .. p-K] to obtain a [I .. I + L-1] = B [I-k .. i-K + L-1], and because B [0 .. l-1] = B [I-k .. i-K + L-1] So a [I .. I + L-1] = B [0 .. l-1], which means ex [I]> = L. Because the next definition is available, a [I + L] must not be equal to B [L] (otherwise a [I .. I + L] = B [0 .. l], because I + L <= P, A [I .. I + L] = B [I-k .. i-K + L], so that B [0 .. l] = B [I-k .. i-K + L], so the value of next [I-K] should be L + 1 or greater. In this way, you can directly obtain ex [I] = L!
(2) I + k-l + 1> = p-K, that is, I + L> P. First, you can know a [I .. p] and B [0 .. p-I] is equal (because a [I .. p] = B [I-k .. p-K], and I + k-l + 1> = p-K, from B [0 .. l-1] = B [I-k .. i-K + L-1] B [0 .. p-I] = B [I-k .. p-K], that is, a [I .. p] = B [0 .. p-I]). Then, we do not know whether a [p + 1] and B [p-I + 1] are equal, P is the farthest position currently matched in string a, and cannot know any matching information after P). Therefore, to continue matching from a [p + 1] to B [p-I + 1] (set J to the subscript of the matching position of B currently, at the beginning, j = p-I + 1. Each time we compare a [I + J] and B [J], whether they are equal until they are not equal or cross-border, in this case, the J value is the value of ex [I ). In this case, the value of P is bound to be extended, so the values of K and P are updated.
Boundary: the value of ex [0] needs to be obtained in advance, and then the initial K is set to 0, and P is set to ex [0]-1.
For the next array, it is also "self-matching", just like the KMP method. The only difference is also on the boundary: you can know the values of next [0] = lenb, next [1] in advance, and then the initial k = 1, P = ex [1].

Note that in the above case (2), the matching should start from a [p + 1] and B [p-I + 1]. However, if p + 1 <I, that is, p-I + 1 <0 (this is possible when ex [I-1] = 0, and the above ex values are not extended to I and later), You Need To A, B are added 1 (because at this time P must be equal to the I-2, if the subscript of A and B is controlled by two variables X and Y, add 1 to both X and Y )!!

Core code:

lenA = strlen(A); lenB = strlen(B);    next[0] = lenB; next[1] = lenB - 1;    re(i, lenB-1) if (B[i] != B[i + 1]) {next[1] = i; break;}    int j, k = 1, p, L;    re2(i, 2, lenB) {        p = k + next[k] - 1; L = next[i - k];        if (i + L <= p) next[i] = L; else {            j = p - i + 1;            if (j < 0) j = 0;            while (i + j < lenB && B[i + j] == B[j]) j++;            next[i] = j; k = i;        }    }    int minlen = lenA <= lenB ? lenA : lenB; ex[0] = minlen;    re(i, minlen) if (A[i] != B[i]) {ex[0] = i; break;}    k = 0;    re2(i, 1, lenA) {        p = k + ex[k] - 1; L = next[i - k];        if (i + L <= p) ex[i] = L; else {            j = p - i + 1;            if (j < 0) j = 0;            while (i + j < lenA && j < lenB && A[i + j] == B[j]) j++;            ex[i] = j; k = i;        }    }

[Time complexity analysis]
In KMP and extended KMP, the matching positions of string a and string B are monotonically increasing. Therefore, the total time complexity is linear and is O (Lena + lenb) (only the expansion of KMP is larger than the constant of KMP ).
[Application]
KMP and extended KMP are widely used in solving string problems. Many seemingly cumbersome string problems can be attributed to these two algorithms. In addition, the "string" can be extended into arrays of all types, not just character arrays.

Reprinted from: http://www.cppblog.com/MatoNo1/archive/2011/04/17/144390.aspx

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.