An understanding of the KMP algorithm (iii)

Source: Internet
Author: User

Earlier, we talked about the pattern-matching violence method of string, and made some improvements on the basis of the brute force method: The match pointer I of the main string is not traced back, and the matching pointer of the pattern string is constantly modified by discovering some characteristics of the pattern string. But how to modify the matching pointer of the pattern string, it is necessary to combine some of its own characteristics, and then produce corresponding modified values, recorded in the next[j] this array.

1. Look for the longest common element length of the prefix suffix:

For, look for the maximum length and equal prefixes and suffixes in the pattern string p. If there is =, then there is the same prefix suffix with the maximum length of k+1 in the pattern string containing PJ. For example, if the given pattern string is "abab", then the maximum length of the common element of the prefix suffix of its various substrings is shown in the table below:

For example, for string ABA, it has the same prefix suffix a of length 1, and for string abab, it has the same prefix suffix ab with length 2 (the length of the same prefix suffix is k + 1,k + 1 = 2).

2. Find the next array:

The next array considers the longest same prefix suffix except for the current character, and why is it in addition to the current character? Recalling the pattern string "Abcdex" and "ABCABX" in the previous two examples, it is found that when the matching pointer reaches a certain character and is going to use the next array, the character must be a character that matches the failure, which is also visible in the source code, so the character, if any, is to be compared. Can't escape from the algorithm. Therefore, after the 1th step to obtain the maximum length of the common elements of each prefix suffix, as long as the deformation can be: the value obtained in the 1th step as a whole minus 1 (0 values remain unchanged, because the longest same prefix suffix can not be negative), and then the initial value is assigned to 1 (here 1 does not represent the longest prefix suffix length, This character is just the first character of the pattern string, as shown in the following table:

For example, for ABA, the string ab before the 3rd character A has the same prefix suffix of length 0, so the 3rd character a corresponds to the next value of 0, and for Abab, the string in ABA before the 4th character B has the same prefix suffix a of length 1. So the 4th character B corresponds to the next value of 1 (the length of the same prefix suffix is k,k = 1).

3. Match according to the next array:

Said the above two steps, the following to get to the point, how to according to the value of the next array, in the case of guaranteed I value does not backtrack, adjust the value of J to match.

Match mismatch, j = Next [j], the number of bits that the pattern string moves to the right relative to the main string is: J-next[j]. In other words, when the suffix of the pattern string matches the text string successfully, but Pj 's match with Si fails, because next[j] = k, which is equivalent to a maximum length of k in a pattern string without PJ has the same prefix suffix, that is, so j = Next[j], So that the pattern string is shifted to the right j-next[j] bit, so that the pattern string prefix

Corresponds to the text string, and then let the Pk and si continue to match. As shown in the following:

To sum up, the next array of KMP tells us where the pattern string should go next when a character in the pattern string matches the mismatch with a character in the text string. If the character at J in the pattern string matches the character mismatch in the text string at I, the next character at Next [J] continues to match the character at text string I, which is equivalent to moving the pattern string to the right j-next[j] bit.

Next look at an example, specifically explaining what's above:

1. Look for the longest prefix suffix:

If the given pattern string is: "Abcdabd", traversing the entire pattern string from left to right, the prefix suffix of each of its substrings is shown in the following table:

In other words, the maximum length table for the common elements of each prefix suffix corresponding to the string substring of the original pattern is ( hereinafter referred to as "Maximum length table"):

2. Based on "Maximum length table matching":

Because there may be duplicate characters in the pattern string, the following conclusions can be drawn:

Mismatch, the number of bits that the pattern string moves to the right is: matched number of characters-maximum length value for the previous character of the mismatch character

The following, combined with the previous "Maximum length table" and the above conclusions, for string matching. If the given text string "BBC Abcdab Abcdabcdabde", and the pattern string "Abcdabd", now to take the pattern string to match the text string, here first directly with the maximum prefix suffix common element length table, first without the next data. Another thing is that the maximum length value of the previous character of the mismatch character is actually the value of the matching pointer of the pattern string at the next match, with the right offset from the main string in the example, and of course both of them are possible. as shown in the following:

1. Because the character a in the pattern string does not match the character B, B, C, and space in the text string, it is not necessary to consider the conclusion that the pattern string will be moved right one at a stroke until the character a in the pattern string matches the 5th character of the text string a success:

2. Continue to match, when the last character of the pattern string is mismatched with the text string, it is obvious that the pattern string needs to move to the right. But how many bits to move to the right? Since the number of characters that have been matched at this time is 6 (ABCDAB), then the length value corresponding to the previous character B of the "max-length table" can be a gain and loss character D is 2, so according to the previous conclusion, you need to move 6-2 = 4 bits to the right.

3. When the pattern string moves 4 bits to the right, it finds that C is again mismatched because 2 characters (AB) are already matched, and the maximum length value for the previous character B is 0, so move right: 2-0 = 2 bits.

4. A with a space mismatch, move to the right 1 bits.

5. Continue to compare, found D and C mismatch, so the number of digits to move to the right: the number of matched characters 6 minus the previous character B corresponds to the maximum length 2, that is, moving to the right 6-2 = 4 bits.

6. After the 5th step, we find that the match is successful and the process is complete.

This completes the entire matching process, even if there are strings that can be matched successfully later. Of course, it is also possible to call the matching function again.

Through the matching process, it can be seen that the crux of the problem is to look for the same prefix and suffix of the maximum length in the pattern string, and after finding the maximum length of the prefix and suffix common part of each character in the pattern string, it can be based on this match. And this maximum length is exactly what the next array is meant to say.

An understanding of the KMP algorithm (iii)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.