Detailed explanation of the BM algorithm

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The BM algorithm is a suffix-based algorithm. Therefore, it moves the mode string from left to right, while the comparison is from right to left.

The core of the BM algorithm is two parallel algorithms (good suffixes and bad characters). These two algorithms aim to move the pattern string as far as possible each time.

The definitions of good suffixes and bad characters are as follows:

The main string and mode string are as follows:

Main string: mahtavaatalomaisema omalomailuun

Mode string: maisemaomaloma

Good Suffix: aloma in the mode string is "Good suffix ".

Bad character: "T" in the main string is a bad character.

The BM algorithm computes three shift functions D1, D2, and D3 in advance, which correspond to the following three situations respectively. In both cases, we assume that we have read a string u that is both a text string suffix and a mode string suffix. The next text character X to be read is not the same as the next character y of the mode string.

(1) There is a good suffix, that is, the suffix U appears in another position in the mode string p. Assume that the rightmost subscript of the u at the other position is J, and the length of the pattern string is M. You need to move the window M-J characters. In this case, shift (D1) calculates the distance from each Suffix of the mode string to its next position. If the suffix U of P is not repeated in P, then D1 (u) is set to the length of the entire pattern string.

(2) There are some suffixes, that is, the largest matching string exists. The suffix V of U is a prefix of the pattern string p. At this time, D2 (u) indicates that it is both the prefix of P and the length of the longest string V with the U suffix.

(3) bad characters. The pattern string cannot be matched successfully when the text character y is output. If the D1 function is used for moving, if the corresponding mode string is still not y, unnecessary verification is performed. Shift function D3 is used to ensure that the text character Y in the next verification must match the character Y in the mode string. D3 (y) indicates the distance from the rightmost position of Y in the mode string to the end of the mode string. If it does not appear, D3 (Y) is set to M.

As I said earlier, this BM algorithm is a parallel processing of good suffixes and bad characters.

Suffix Algorithm

If the program matches a good suffix and there is another identical Suffix in the pattern

Move the next suffix to the current suffix. There are two scenarios for a good suffix algorithm:

Case1: If a pattern string contains a child string and a matched suffix, move the rightmost child string to the position of the suffix. Continue matching.

Case2: If there is no substring that exactly matches the suffix, find the oldest substring with the following features in the suffix so that P [M-S... M] = P [0... S]. See the picture clearly.

Bad character Algorithm

When a bad character occurs, the BM algorithm moves the pattern string to the right, compares the rightmost character in the pattern string with the bad character, and continues matching. There are two bad character algorithms.

Case1: When a mode string contains bad characters, see the figure below.

Case2: the mode string does not contain any bad characters. See figure.

Mobile rule

The moving rules of the BM algorithm are as follows:

Replace ++ J in the overview with J + = max (shift (good suffix), shift (bad character), that is

The BM algorithm moves the distance between the pattern strings to the right. The maximum value is calculated based on the suffix algorithm and the bad character algorithm.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Detailed explanation of the BM algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Detailed explanation of the BM algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support