The content of this article is from This book is not original except for some of my own understandings and descriptions. Only for notes.
The first step is the definition of the problem. Character matching here refers to continuous substring matching, rather than common subsequences. For example, asdfgge and dfg are matched because asdfgge contains dfg. The problem is to determine whether substring A contains substring B.
I. BF algorithm (brute force solution)
Simply compare the pattern string with the target string. If there is a mismatch, it will be traced back to the first place of the pattern string and re-compare the entire pattern string.
Ii. KMP Algorithm
Compared with BF, KMP is optimized to define the comparison process of failure functions to reuse. Here, the invalid function refers to the moving distance of the mode string when a character does not match.
The theoretical basis of the KMP algorithm is that we can know the situation of the pattern string before performing the comparison.
Iii. BM algorithm
Compared with the KMP algorithm, the BM algorithm is more efficient. Unlike the general matching algorithm, In the BM algorithm, the pattern string is also moved from left to right, but the comparison process is indeed from right to left. The specific theoretical basis of the algorithm is as follows:
A. if yes, set the matched substring to u because B is different from a on the left of the matched substring u, if the same u is found in the remaining unmatched substrings and the left side is not character a, the u is aligned with the u of the target string, and the offset is saved.
B. If the remaining substrings do not reach the substring u That Is Not a on the left, the alignment mode string and the largest substring V of the u will be alignment, and the resulting offset will be saved.
Obviously, only one of the situations A and B can happen at A time.
C. different substring U is the basis for determining the feasible solution, but is based on the mismatch of character B. Check the character in the entire pattern string to see if it contains B. If yes, alignment B, and record the resulting offset.
D. if this mode string does not contain a single B character, it means that the mode string cannot be a child string that contains B, and the mode string is directly moved to the next position of B, and save the generated offset.
Similarly, C and D can only happen once.
Finally, take Max (A or B, C or D) as the final offset. Move the mode string and start a new comparison process. The following are some of my questions.
1. Why is the maximum offset required? In either case, the solution within the offset range cannot be the final solution, that is, only the solution with the largest offset can become the final solution.
2. Do these four situations cover all solutions? The answer is that I don't know either. As I mentioned in the book, it turns out to be very troublesome. For me, it is not necessary to understand the idea of solving problems.
O la ~~~
Reprinted please keep Source: http://blog.csdn.net/u011638883/article/details/20650119
Thank you !!