String Matching Algorithm-BM algorithm

Source: Internet
Author: User

The BM algorithm is an improvement of the KMP algorithm, which is 3-bit faster than the KMP algorithm ~ 5 times


The BM algorithm mainly follows two principles: 1. Bad characters, 2. Good suffixes.


Assume that the main string S exists. The length is s_l, and the pattern string T is t_l.

1. Bad characters


If there is a character in the main string S and this character does not exist in the mode string T, the mode string shifts the t_l bit to the right. (Because, if such a character exists, you can find it during the first match, so you need to move the t_l distance to the right)


If there is a character in the main string S, but this character is not in the current position of the mode string T, you can move the mode string so that the rightmost character in the mode string T, alignment with main string S


Therefore:


T_l x! = T [j], 1 <= j <= t_l indicates that the character does not exist in the mode string.

Deltal (x) =

M-max (k/T [k] = x, 1 <= k <m), x in the rightmost position of the Mode



Code:

Void bminitocc () // judge the bad character function {int I; for (I = 0; I <t_length; I ++) occ [t [I] = I ;}



2. suffix


Sometimes bad characters may fail. Alignment the rightmost match of the pattern symbol to the corresponding character of the Main string, which may lead to a negative shift. However, it is feasible to move a position, but in this case, it is better to derive the maximum possible shift distance from the structure of the mode string, which is called suffix inspiration.


There are two scenarios for a good Suffix:

1,

A substring In The Middle Of T is equal to the compared part.


<喎?http: www.bkjia.com kf ware vc " target="_blank" class="keylink"> VcD4KPHA + MqGiPC9wPgo8cD48aW1nIHNyYz0 = "" alt = "\">

T has the same suffix as T.


In the above two cases, we take the smallest shift distance for the distance to be moved, because we need to ensure that each of the existing possibilities is compared.


Question 1: Now, let's look at a problem. When a comparison is performed, the comparison part has the same prefix, and there is a completely identical part in the other position of T. At this time, we can find that the first case (that is, a substring In The Middle Of T is equal to the part already compared) the shift distance is shorter. Therefore, we can determine that if both cases exist, we only need to take the distance from the first case, because this is definitely shorter than the second case.


Question 2: when we know the suffix at the beginning of the pattern string, then we can know when the second problem occurs? Http://www.bkjia.com/kf/yidong/wp/ "target =" _ blank "class =" keylink "> signature + cda-vcd4kpha + uPm + 3 cnPw + a1xMG91tbH6b/Signature + m/Signature + 1tbH6b/Signature + cda-vcd4kpha + signature + Signature + ZiBbXSAgtOa0osO/uPbOu9bDtcS6w7rz17o8L3A + 5E + c?vcd4kpha + PHByZSBjbGFzcz0 = "brush: java; "> void BMP re Process1 () // store all the locations with good suffixes {int I = t_length, j = t_length + 1; f [I] = j; while (I> = 0) {while (j <= t_length & t [I-1]! = T [J-1]) {if (next [j] =-1) next [j] = j-I; // when there is a good suffix, the right shift position j = f [j];} I --; j --; f [I] = j;} // cout <






Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.