"Classical Algorithm"--KMP, in-depth explanation of next array solution

Source: Internet
Author: User

We have a number of ways to find a substring in a female string. KMP is one of the most common improved algorithms that, when mismatched in a matching process, effectively skips several characters in the back face and speeds up the match.

Of course we can see that this algorithm is for a substring with symmetric properties, and if there is a symmetric attribute, then we need to look forward to see if there is something that can be matched again.

In the KMP algorithm has a number of groups, called the prefix array, there is also called next array, each substring has a fixed next array, it records the string matching process in the case of mismatch can forward more than a few characters, of course, it describes the sub-string symmetry degree, the higher the degree, the greater the value, Of course, there is a greater chance of a match before.

This next array is the key to the KMP algorithm, but not very good understanding, I am here with the popular words to explain, see other places everywhere is the mathematical formula derivation, see all the egg pain, this article only contribute to do not like to see the mathematical formula and want to understand the KMP algorithm classmate.

1, with an example to explain, the following is a substring of the next array of values, you can see that this substring is very symmetrical, so the next value is relatively large.

position I

+

1

2

3

4

5

6

7

9

One

"

+

0

0

0

1

2

3

1

4

5

6

7

4

0

Sub-string

A

G

C

T

A

G

C

A

G

C

T

A

G

C

T

G

Statement: The following symmetry is not a central symmetry, but the central word converts sequential blocks symmetry, such as not ABCCBA, but abcabc this symmetry.

(1) Find the symmetric string one by one.

This is very simple, we just loop through the substring, see the first 1 characters, the first 2 characters, 3 ... I last to 15.

1th A is asymmetric, so the symmetry degree 0

The first two AG is asymmetric, so is also 0

The first 0-4 is the same 0.

The first 5 agcta, you can see that the string has an equal, so the degree of symmetry is 1 the first 6 Agctag, see AG and AG pair, symmetry degree of 2

Here to pay attention, want to think so, programming how to achieve it?

Just follow the rules below:

A, when the previous character of the current polygon has a symmetry of 0, simply compare the current character to the first character of the substring. This is very good understanding ah, the front is 0, the explanation is not symmetrical, if added a character, to symmetry is the most current and the first symmetry. For example agcta this inside T is 0, then the symmetry of a behind only need to see it is not equal to the first character A.

B, according to this reasoning, we can summarize a rule, not only the front is 0 ah, if the previous character of the next value is 1, then we will be the current character and substring of the second character to compare, because the preceding is 1, stating that the preceding character is equal to the first, if this is equal to the second one, It shows that symmetry is 2. There are two characters symmetrical. For example above Agctag, the next of the penultimate A is 1, indicating that it is symmetrical with the first a, then we will compare the last G with the second G, and equal, the natural symmetry of Chengdu is cumulative, is 2.

C, according to the above reasoning, if it has been equal, has been cumulative, you can always push ah, push to here should be a bit difficult to have it, if you think it is difficult to explain that I write too failed.

Of course, it's not going to be so smooth. Let us always symmetry, if we encounter the next unequal, then the description can not inherit the preceding symmetry, this situation can only show that there is not so much symmetry, but can not be explained that a little symmetry is not, so it is necessary to reconsider this situation, this is also the difficulty.

(2) Looking back for symmetry

Here can not inherit the front, but still look for symmetry in Chengdu, the most foolish thing to write a sub-function, find the maximum symmetry of the string, how to write a lot of it, such as find out all the current string, and then go forward to see whether it is equal, and finally go to the beginning of the substring, of course, this is the stupidest The KMP we generally see are optimized because the strings are regular.

Here is an example of a paragraph from the table above:

Position i=0 to 14 as follows, I add parentheses just to illustrate the problem:

(A G c t a G C) (A G c t a G C) t

We can see this, the final symmetry before this T is: 1,2,3,4,5,6,7, the second-to-last C looks forward with a 7-character symmetry, so called 7. But at the end of this t does not inherit the previous symmetry degree next value, so the symmetry of this t is to seek again.

There are several facts to be stated here first.

1, T if there is symmetry, then the degree of symmetry is certainly more than the symmetry of the previous C is small, so to find a smaller symmetry, this does not need to explain it, if the big so T inherits the symmetry of the front.

2, in order to find a smaller symmetry, there must be a symmetry inside there is also a sub-symmetry, and this t has to be immediately after the sub-symmetry.

As stated.

Here's how the partial match table is produced.

First, you need to understand the two concepts: prefix and suffix. "prefix" means the combination of all the headers of a string except the last character; "suffix" means all the trailing combinations of a string in addition to the first character.

The partial match value is the length of the longest common element of the prefix and suffix. Take "Abcdabd" as an example,

-the prefix and suffix of "A" are empty, and the total element length is 0;

-the "AB" prefix is [A], the suffix is [B], the total element length is 0;

-the "ABC" prefix is [A, AB], the suffix is [BC, C], the length of the common element is 0;

-the "ABCD" prefix is [A, AB, ABC], suffix [BCD, CD, D], the length of the common element is 0;

-the "abcda" prefix is [A, AB, ABC, ABCD], the suffix is [bcda, CDA, DA, a], the common element is "a", the length is 1;

-"Abcdab" is prefixed with [A, AB, ABC, ABCD, abcda], suffix [Bcdab, Cdab, DAB, AB, B], the total element is "AB", the length is 2;

-"ABCDABD" is prefixed with [A, AB, ABC, ABCD, ABCDA, Abcdab], suffix [bcdabd, cdabd, Dabd, ABD, BD, D], with a total element length of 0.

From the above theory we can get the following prefix to the next array to solve the algorithm.

void Setprefix (const char *pattern, int prefix[])

{

int Len=charlen (pattern);//pattern string length.

prefix[0]=0;

for (int i=1; i<len; i++)

{

int k=prefix[i-1];

Constant recursion to determine whether there is a sub-symmetry, k=0 description no longer have sub-symmetry, pattern[i]! = Pattern[k] Description Although symmetric, but the values behind the symmetry and the current character values are not equal, so continue to recursion

while (pattern[i]! = pattern[k] && k!=0)

K=PREFIX[K-1]; Continue to recursion

if (pattern[i] = = Pattern[k])//Find this sub-symmetry, or directly inherit the preceding symmetry, both of which are based on the previous + +

prefix[i]=k+1;

Else

prefix[i]=0; If all the sub-symmetry is traversed, the new character is not symmetric, clear 0

}

}

With this explanation, it is estimated that the next principle of KMP is understood, and the rest is simple. I myself also a bit dizzy, really do not like those mathematical formulas, so with the image of logical thinking method summed up a bit.

"Classical Algorithm"--KMP, in-depth explanation of next array solution

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.