The BM algorithm is a suffix-based algorithm. Therefore, it moves the mode string from left to right, while the comparison is from right to left.
The core of the BM algorithm is two parallel algorithms (good suffixes and bad characters). These two algorithms aim to move the pattern string as far as possible each time.
The definitions of good suffixes and bad characters are as follows:
The main string and mode string are as follows:
Main string: mahtavaatalomaisema omalomailuun
Mode string: maisemaomaloma
Good Suffix: aloma in the mode string is "Good suffix ".
Bad character: "T" in the main string is a bad character.
The BM algorithm computes three shift functions D1, D2, and D3 in advance, which correspond to the following three situations respectively. In both cases, we assume that we have read a string u that is both a text string suffix and a mode string suffix. The next text character X to be read is not the same as the next character y of the mode string.
(1) There is a good suffix, that is, the suffix U appears in another position in the mode string p. Assume that the rightmost subscript of the u at the other position is J, and the length of the pattern string is M. You need to move the window M-J characters. In this case, shift (D1) calculates the distance from each Suffix of the mode string to its next position. If the suffix U of P is not repeated in P, then D1 (u) is set to the length of the entire pattern string.
(2) There are some suffixes, that is, the largest matching string exists. The suffix V of U is a prefix of the pattern string p. At this time, D2 (u) indicates that it is both the prefix of P and the length of the longest string V with the U suffix.
(3) bad characters. The pattern string cannot be matched successfully when the text character y is output. If the D1 function is used for moving, if the corresponding mode string is still not y, unnecessary verification is performed. Shift function D3 is used to ensure that the text character Y in the next verification must match the character Y in the mode string. D3 (y) indicates the distance from the rightmost position of Y in the mode string to the end of the mode string. If it does not appear, D3 (Y) is set to M.
As I said earlier, this BM algorithm is a parallel processing of good suffixes and bad characters.
Suffix Algorithm
If the program matches a good suffix and there is another identical Suffix in the pattern
Move the next suffix to the current suffix. There are two scenarios for a good suffix algorithm:
Case1: If a pattern string contains a child string and a matched suffix, move the rightmost child string to the position of the suffix. Continue matching.
Case2: If there is no substring that exactly matches the suffix, find the oldest substring with the following features in the suffix so that P [M-S... M] = P [0... S]. See the picture clearly.
Bad character Algorithm
When a bad character occurs, the BM algorithm moves the pattern string to the right, compares the rightmost character in the pattern string with the bad character, and continues matching. There are two bad character algorithms.
Case1: When a mode string contains bad characters, see the figure below.
Case2: the mode string does not contain any bad characters. See figure.
Mobile rule
The moving rules of the BM algorithm are as follows:
Replace ++ J in the overview with J + = max (shift (good suffix), shift (bad character), that is
The BM algorithm moves the distance between the pattern strings to the right. The maximum value is calculated based on the suffix algorithm and the bad character algorithm.