Detailed explanation of the BM algorithm

Source: Internet
Author: User

The BM algorithm is a suffix-based algorithm. Therefore, it moves the mode string from left to right, while the comparison is from right to left.

The core of the BM algorithm is two parallel algorithms (good suffixes and bad characters). These two algorithms aim to move the pattern string as far as possible each time.

The definitions of good suffixes and bad characters are as follows:

The main string and mode string are as follows:

Main string: mahtavaatalomaisema omalomailuun

Mode string: maisemaomaloma

Good Suffix: aloma in the mode string is "Good suffix ".

Bad character: "T" in the main string is a bad character.

 

The BM algorithm computes three shift functions D1, D2, and D3 in advance, which correspond to the following three situations respectively. In both cases, we assume that we have read a string u that is both a text string suffix and a mode string suffix. The next text character X to be read is not the same as the next character y of the mode string.

(1) There is a good suffix, that is, the suffix U appears in another position in the mode string p. Assume that the rightmost subscript of the u at the other position is J, and the length of the pattern string is M. You need to move the window M-J characters. In this case, shift (D1) calculates the distance from each Suffix of the mode string to its next position. If the suffix U of P is not repeated in P, then D1 (u) is set to the length of the entire pattern string.

(2) There are some suffixes, that is, the largest matching string exists. The suffix V of U is a prefix of the pattern string p. At this time, D2 (u) indicates that it is both the prefix of P and the length of the longest string V with the U suffix.

(3) bad characters. The pattern string cannot be matched successfully when the text character y is output. If the D1 function is used for moving, if the corresponding mode string is still not y, unnecessary verification is performed. Shift function D3 is used to ensure that the text character Y in the next verification must match the character Y in the mode string. D3 (y) indicates the distance from the rightmost position of Y in the mode string to the end of the mode string. If it does not appear, D3 (Y) is set to M.

 

As I said earlier, this BM algorithm is a parallel processing of good suffixes and bad characters.

Suffix Algorithm

If the program matches a good suffix and there is another identical Suffix in the pattern

Move the next suffix to the current suffix. There are two scenarios for a good suffix algorithm:

Case1: If a pattern string contains a child string and a matched suffix, move the rightmost child string to the position of the suffix. Continue matching.

Case2: If there is no substring that exactly matches the suffix, find the oldest substring with the following features in the suffix so that P [M-S... M] = P [0... S]. See the picture clearly.

Bad character Algorithm

When a bad character occurs, the BM algorithm moves the pattern string to the right, compares the rightmost character in the pattern string with the bad character, and continues matching. There are two bad character algorithms.

Case1: When a mode string contains bad characters, see the figure below.


Case2: the mode string does not contain any bad characters. See figure.

 

Mobile rule

The moving rules of the BM algorithm are as follows:

Replace ++ J in the overview with J + = max (shift (good suffix), shift (bad character), that is

The BM algorithm moves the distance between the pattern strings to the right. The maximum value is calculated based on the suffix algorithm and the bad character algorithm.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.