The BM algorithm is an improvement of the KMP algorithm, which is 3-bit faster than the KMP algorithm ~ 5 times
The BM algorithm mainly follows two principles: 1. Bad characters, 2. Good suffixes.
Assume that the main string S exists. The length is s_l, and the pattern string T is t_l.
1. Bad characters
If there is a character in the main string S and this character does not exist in the mode string T, the mode string shifts the t_l bit to the right. (Because, if such a character exists, you can find it during the first match, so you need to move the t_l distance to the right)
If there is a character in the main string S, but this character is not in the current position of the mode string T, you can move the mode string so that the rightmost character in the mode string T, alignment with main string S
Therefore:
T_l x! = T [j], 1 <= j <= t_l indicates that the character does not exist in the mode string.
Deltal (x) =
M-max (k/T [k] = x, 1 <= k <m), x in the rightmost position of the Mode
Code:
Void bminitocc () // judge the bad character function {int I; for (I = 0; I <t_length; I ++) occ [t [I] = I ;}
2. suffix
Sometimes bad characters may fail. Alignment the rightmost match of the pattern symbol to the corresponding character of the Main string, which may lead to a negative shift. However, it is feasible to move a position, but in this case, it is better to derive the maximum possible shift distance from the structure of the mode string, which is called suffix inspiration.
There are two scenarios for a good Suffix:
1,
A substring In The Middle Of T is equal to the compared part.
<喎?http: www.bkjia.com kf ware vc " target="_blank" class="keylink"> VcD4KPHA + MqGiPC9wPgo8cD48aW1nIHNyYz0 = "" alt = "\">
T has the same suffix as T.
In the above two cases, we take the smallest shift distance for the distance to be moved, because we need to ensure that each of the existing possibilities is compared.
Question 1: Now, let's look at a problem. When a comparison is performed, the comparison part has the same prefix, and there is a completely identical part in the other position of T. At this time, we can find that the first case (that is, a substring In The Middle Of T is equal to the part already compared) the shift distance is shorter. Therefore, we can determine that if both cases exist, we only need to take the distance from the first case, because this is definitely shorter than the second case.
Question 2: when we know the suffix at the beginning of the pattern string, then we can know when the second problem occurs? Http://www.bkjia.com/kf/yidong/wp/ "target =" _ blank "class =" keylink "> signature + cda-vcd4kpha + uPm + 3 cnPw + a1xMG91tbH6b/Signature + m/Signature + 1tbH6b/Signature + cda-vcd4kpha + signature + Signature + ZiBbXSAgtOa0osO/uPbOu9bDtcS6w7rz17o8L3A + 5E + c?vcd4kpha + PHByZSBjbGFzcz0 = "brush: java; "> void BMP re Process1 () // store all the locations with good suffixes {int I = t_length, j = t_length + 1; f [I] = j; while (I> = 0) {while (j <= t_length & t [I-1]! = T [J-1]) {if (next [j] =-1) next [j] = j-I; // when there is a good suffix, the right shift position j = f [j];} I --; j --; f [I] = j;} // cout <