in the review of the soft test, we have seen several string pattern matching algorithms. It looks very difficult. So it took some time to check the algorithm for string matching. Here is a detailed description of the KMP pattern matching algorithm
What is a string match?
Look in the article. You need to find the location where you want to find the content. is a string match.
A simple pattern matching algorithm
The simple pattern matching algorithm is to compare the content to be found, step-by-step with the article to be looked up. If the match fails, the main string and the string backtrack. String position plus 1. Re-match.
The flow of the pattern matching algorithm is as follows:
In the case of a match failure, the pattern string is only shifted right after one . Match from the beginning.
Two for loops
For I=1 to Length (main string)-length (pattern string) +1
for J=1 to length (pattern string)
So time complexity is: O ((n-m+1) *m) when the pattern string is small, the time complexity is O (MN)
KMP Matching algorithm:
When looking at this example, if the match fails. Makes the pattern string move right one position. Start the match again. This situation. The main string and the pointer to the pattern string are all backtracking.
in a simple matching algorithm, no matter how many characters have been matched correctly, the pointer is going to backtrack in case of an unworthy situation. Start the next round again. This can lead to waste of resources.
KMP algorithm is to eliminate this waste. the KMP algorithm uses a partial string that has been matched well .
Use an example to illustrate how to take advantage of a string that has already been matched.
C match failed. But there is already a successful substring of "Abaaba" in front of it, so take advantage of this well-matched string.
Only in this case can the right shift be maximized. To make the pointer do not backtrack.
In a substring that has been successfully matched. How to find the maximum offset?
Available substrings:
In this it is necessary to use the prefix and suffix of the string.
For example, the string "ABCD"
prefix: A, AB, ABC, ABCD
suffix: D, CD, BCD, ABCD
Just find the same part of the prefix and suffix. Is the part that can be used.
In the example:
Abaaba prefixes: A, AB,ABA, Abaa, Abaab
abaaba suffix: A, BA,ABA, Aaba, Baaba
Find the maximum length of the common part in the prefix and suffix "aba" so the string to be used is "ABA".
Right shift of pattern string:
Use a string that has been matched to determine the number of digits to the right of the substring. Use the next function.
according to the above description, the pointer is always to the right and does not backtrack. So to match the length to m+n
so KMP the time complexity of the algorithm is: O (m+n)
in the KMP the matching algorithm is often seen. A next function:
Forms such as:
In the pattern string abaabaca. The corresponding next string is: 01122341
NEXT[J] indicates that after J match error. To take advantage of the preceding substring, you can use a substring of length +1.
As for NEXT[J] what it says. I didn't know what to do with a lot of checks. If the reader has an understanding of the next function, please leave a message.
Reference Blog: http://www.ruanyifeng.com/blog/2013/05/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm.html
String pattern matching algorithm--detailed KMP algorithm