The "Search" function (CTRL + F) of various text editors mostly uses the Boyer-Moore algorithm.
Boyer-Moore algorithms are not only efficient, but also clever and easy to understand. In 1977, Professor Robert S. Boyer of the University of Texas and Professor J Strother Moore invented this algorithm.
Next, I will explain this algorithm based on Professor Moore's own example.
1.
Assume that the string is "here is a simple example" and the search term is "example ".
2.
First, the "string" and "search term" headers are aligned and compared from the end.
This is a very clever idea, because if the tail character does not match, you can know that the first seven characters are not the result of a comparison.
We can see that "S" and "E" do not match. At this time,"S" is called "bad character" (bad character), that is, unmatched characters.We also found that "S" is not included in the search term "example", which means that you can directly move the search term to the next digit of "S.
3.
The comparison starts from the end and finds that "p" does not match "E", so "P" is a "bad character ". However, "P" is included in the search term "example. Therefore, after the search term is moved to two places, the two "p" are aligned.
4.
Therefore, we can conclude that"Bad character rules":
Number of digits after shift = Location of bad characters-last position in the search term
If the "bad character" is not included in the search term, the last occurrence location is-1.
Take "p" as an example. It is used as a "bad character" and appears at the 6th-bit (numbered from 0) of the search term. The last appearance position in the search term is 4, so move 6-4 to 2. Take "S" in the second step as an example. It appears at 6th bits. If the last occurrence is-1 (that is, it does not appear), the entire search term is moved 6-(-1) = 7 bits.
5.
The comparison starts from the end, and "E" matches "E.
6.
Compare the first digit. "Le" matches "Le.
7.
Compare the previous one, and "ple" matches "ple.
8.
Compare the first digit, and "mple" matches "mple.We call this situation "Good suffix", that is, all matching strings at the end.Note that "mple", "ple", "Le", and "E" are good suffixes.
9.
Comparing the previous one, we found that "I" and "A" do not match. Therefore, "I" is a "bad character ".
10.
According to the "bad character rules", the search term should be 2-(-1) = 3 characters later. The problem is, is there a better shift method at this time?
11.
We know that there is a "good suffix ". Therefore, you can use"Suffix rules":
Number of digits after the move = position with a good suffix-last position in the search term
When calculating, the value of the position is based on the last character of the "good suffix. If the "good suffix" is not repeated in the search term, its last appearance location is-1.
Among all the "good suffixes" (mple, ple, le, and e), only "E" appears twice in "example", so 6-0 = 6 digits are removed.
12.
As you can see, the "bad character rule" can only be moved to 3 bits, and the "good suffix rule" can be moved to 6 bits. So,The basic idea of the Boyer-Moore algorithm is to move the two rules one by one.
What's more clever is that the number of moving digits of these two rules is only related to the search term, not to the original string. Therefore, you can pre-calculate and generate the bad character rule table and the good suffix rule table. During use, you only need to compare the table.
13.
The comparison starts from the end. "P" does not match "E", so "P" is a "bad character ". Based on the "bad character rule", the value is shifted to 6-4 = 2 characters.
14.
Start from the end and compare by bit. If all matches are found, the search ends. If you want to continue searching (that is, find all matches), Move 6-0 = 6 digits after the suffix rule, that is, the "E" of the header is moved to the "e" position at the end.
Link: http://blog.jobbole.com/39132/
The algorithm is simple and easy to implement ~