A string pattern matching algorithm---KMP algorithm of Java data structure

Source: Internet
Author: User
Tags bitwise

The main ideas of this paper are reference http://kb.cnblogs.com/page/176818/

Please let me know if you have any offense, thank you.

First, KMP algorithm

The KMP algorithm can complete the string pattern matching operation at the time Order of O (N+m), and the basic idea is that whenever a string comparison occurs in the matching process, there is no need to backtrack the pointer, but instead use the "partial match" result that has been obtained to "slide" the pattern to the right as far as possible to continue the comparison. Obviously we need to get a "partial match" result first, how is the result calculated?

Second, the algorithm analysis

In the previous article discussed the BF algorithm, when the target string compared with the source string, will be compared by character, once found that the mismatch will be back to the head to compare, a waste of time,

For example, there is a string "BBC Abcdab Abcdabcdabde", and I want to know if it contains another string "Abcdabd"?

1.

First, the first character of the string "BBC Abcdab Abcdabcdabde" is compared to the first character of the search term "abcdabd". Because B does not match A, the search term moves one after the other.

2.

Because B does not match A, the search term moves backwards.

3.

This is the case until the string has a character that is the same as the first character of the search term.

4.

It then compares the string to the next character of the search term, or the same.

5.

Until the string has a character that is not the same as the character that corresponds to the search term.

6.

At this point, the most natural response is to move the search term to one place, and then compare it from the beginning to the next. While this works, it is inefficient, because you want to move the "search location" to a location that has been compared again.

7.

One basic fact is that when the pod does not match D, you actually know that the first six characters are "Abcdab". The idea of the KMP algorithm is to try to take advantage of this known information and not move the "search location" back to the location that has already been compared and move it backwards, which improves efficiency.

8.

How do you do that? A partial match table can be calculated for the search term. This table is how to produce, after the introduction, here as long as it can be used.

9.

When a known space does not match D, the first six characters "Abcdab" are matched. The table shows that the last matching character B corresponds to a "partial match value" of 2, so the following formula calculates the number of bits moved backwards:

Move digits = number of matched characters-corresponding partial match values

Because 6-2 equals 4, the search term is moved backwards by 4 bits.

10.

Because the spaces do not match the C, the search term continues to move backwards. At this point, the matched number of characters is 2 ("AB"), corresponding to the "partial match value" of 0. So, move the number of bits = 2-0, the result is 2, and then move the search word back 2 bits.

11.

Because the spaces do not match a, continue to move back one bit.

12.

Bitwise comparison until you find that C and D do not match. So, move the number of digits = 6-2 and continue to move the search word backwards by 4 bits.

13.

The search is completed by a bitwise comparison until the last one in the search term finds an exact match. If you want to continue searching (that is, find all matches), move the number of digits = 7-0, and then move the search word back 7 bits, there is no repetition.

14. The following is the focus on this part of the matching table, I feel that the original author parsing good,

Third, the generation of partial matching table

This part of the content of ideas, their own on the internet to find a few ideas to feel the following said the most appropriate:

First, you need to understand the two concepts:prefix and suffix. "prefix" means the combination of all the headers of a string except the last character; "suffix" means all the trailing combinations of a string in addition to the first character.

The partial match value is the length of the longest common element of the prefix and suffix. Take "Abcdabd" as an example,

-the prefix and suffix of "A" are empty, and the total element length is 0;

-the "AB" prefix is [A], the suffix is [B], the total element length is 0;

-the "ABC" prefix is [A, AB], the suffix is [BC, C], the length of the common element is 0;

-the "ABCD" prefix is [A, AB, ABC], suffix [BCD, CD, D], the length of the common element is 0;

-the "abcda" prefix is [A, AB, ABC, ABCD], the suffix is [bcda, CDA, DA, a], the common element is "a", the length is 1;

-"Abcdab" is prefixed with [A, AB, ABC, ABCD, abcda], suffix [Bcdab, Cdab, DAB, AB, B], the total element is "AB", the length is 2;

-"ABCDABD" is prefixed with [A, AB, ABC, ABCD, ABCDA, Abcdab], suffix [bcdabd, cdabd, Dabd, ABD, BD, D], with a total element length of 0.

16.

The essence of "partial match" is that sometimes the string header and tail are duplicated. For example, "Abcdab" has two "AB", then its "partial match value" is 2 ("ab" length). When the search term moves, the first "AB" Moves backwards 4 bits (the length of the string-part of the match), and it can come to the second "ab" position.

Here are the following rules to define: When J et 0 o'clock some place 1, here is 1, in fact, all the same, feel that the rule of 1 is the most extensive.

A string pattern matching algorithm---KMP algorithm of Java data structure

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.