Principles of the BM pattern matching algorithm (illustration)

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Http://hi.baidu.com/l6834279/item/d6ef651684dda4fcddeecae3

First, briefly describe some basic concepts about the BM algorithm.

The BM algorithm is an exact string matching algorithm (different from fuzzy match ).

The BM algorithm uses the right-to-left comparison method and applies two heuristic rules, namely, the bad character rule and the suffix rule, to determine the distance to the right jump.

The basic process of the BM algorithm: Set the text string T and the mode string to P. First, align T and P to the left, and then compare them from the right to the left, as shown in:

If a comparison does not match, the BM algorithm uses two heuristic rules, namely, the bad character rule and the suffix rule, to calculate the distance from the pattern string to the right until the end of the matching process.

Next, we will introduce in detail the bad character rules and suffix rules.

First, describe the concept of bad characters and suffix.

See:

In the figure, the first unmatched character (red) is a bad character, and the matched character (green) is a better suffix.

1) bad character rules (bad character ):

When the BM algorithm scans from the right to the left, if a character X does not match, the following two cases are discussed:

I. If character X does not appear in mode P, then M texts starting from character x obviously cannot match P. Skip this area directly.

II. If X appears in mode P, it is aligned with this character.

It is represented by a mathematical formula, where Skip (x) is the right-shift distance of P, M is the length of the pattern string P, and Max (x) is the rightmost position of character X in P.

Example 1:

The red part does not match once.

Calculates the moving distance from Skip (c) to 5-3 = 2, then P moves two places to the right.

After moving, for example:

2) Good suffix rules (good suffix ):

If a character does not match, some of the existing characters are successfully matched, the following two cases are discussed:

I. if the position t in P matches a part of P' in P, and the character before the position T is different from the character before the position t, shift P to the right so that t' corresponds to the location where T is located.

II. if no part of P matches in P, find the longest prefix X of P with the same suffix P ''as P, and move P to the right, the position where the suffix of P ''is located before the correspondence of X.

It is represented by a mathematical formula, where shift (j) is the right-shift distance of P, M is the length of the pattern string P, and J is the position of the matched character, S is the distance between t' and T (I in the above case) or the distance between x and P' (II in the above case ).

The above process is a bit abstract, so we will continue to illustrate it.

Example 2:

The matched cab (green) does not appear in P.

Then, if the suffix t' (blue) matches the prefix P' (red) in P, the P' is moved to the t' position.

After moving, for example:

Since then, the two rules have been explained.

In the process of BM algorithm matching, take the greater person in skip (X) and shift (j) as the Jump Distance.

The pre-processing time complexity of the BM algorithm is O (M + S), the space complexity is O (s), and S is the finite character set length related to P and T, the time complexity of the search phase is O (m · N ).

In the best case, the time complexity is O (N/m), and in the worst case, the time complexity is O (m · N ).

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Principles of the BM pattern matching algorithm (illustration)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Principles of the BM pattern matching algorithm (illustration)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support