Personally feel that this article is an online introduction to the KMP algorithm more easily understandable articles, indeed said very "detailed", patiently read it will certainly be harvested ~ ~, the other about the mode function value Next[i] Indeed there are many versions ah, in some other object-oriented algorithm description book also has the failure function f ( j), in fact, is a meaning, that is, Next[j]=f (j-1) +1, but still next[j] This notation is good to understand AH:
KMPA detailed explanation of string pattern matchingKMP string pattern Matching popular point is an efficient algorithm for locating another string in a string. The time complexity of the simple matching algorithm is O (m*n); KMP matching algorithm. It can be proved that its time complexity is O (m+n). 。a . Simple Matching algorithmLet's take a look at a simple matching algorithm function: int INDEX_BF (char s [], char T [], int pos) {/* If the string s from the POS (s subscript 0≤pos<strlength (s)) characters exist and String T the same substring, then say match succeeds, return the first such substring in string S subscript, otherwise return-1/int i = pos, j = 0; while (S[i+j]!= '/0 ' && t[j]!= '/0 ') if (s[i+j] = = T[j]) j + +; Continue comparing the latter character else {i + +; j = 0;//start a new round match} if (t[j] = = '/0 ') return i; Match successfully returns subscript else return-1; String S (pos Fu Qi) does not exist with string T the same substring}//INDEX_BF
The idea of this algorithm is straightforward: to compare the substring of a position I starting with the pattern string T in the main string S. That is, from the j=0 to compare S[i+j] and t[j], if the equality, then in the main string S in the first position to match the probability of success, continue to compare (J Step 1), until the last word in the T string story, etc., or change from the next word in the S string Fu Qi start again the next round of "Match", the string T slide backward one bit, that is, I 1, and J back to 0, restart the new round of matching. For example: Find t= "Abcabd" in string s= "Abcabcabdabba" (we can assume starting with subscript 0): First compare s[0] and t[0] are equal, then compare s[1] and t[1] equality ... We found that the comparison to s[5] and t[5] only ranged. As shown in the figure: when such a mismatch occurs, the T subscript must go back to the beginning, the length of the s subscript backtracking is the same as T, then the s subscript is increased by 1 and then compared again. Figure: This time the mismatch, T subscript and back to the beginning, S subscript 1, and then compare again. Figure: This time the mismatch, T subscript and back to the beginning, S subscript 1, and then compare again. As shown in figure:
Another mismatch has occurred, so the T subscript goes back to the beginning, and the S subscript increases by 1 and then compares again. All the characters in T characters and the corresponding character in S are matched. function returns the start Subscript 3 of T in S. As shown in figure:
two . KMP Matching algorithmOr the same example, in s= "Abcabcabdabba" to find t = "abcabd", if the KMP matching algorithm, when the first search to s[5] and t[5] unequal, the S subscript is not back to 1, T subscript is not traced to the beginning, but based on t [5]== ' d ' mode function value (next[5]=2, why.) Later), the direct comparison between S[5] and t[2] is equal, since the subscript of S and T increases at the same time; Because they are equal, the subscript of S and T increases at the same time ... In the end, we found T in S. As shown in figure:
KMP matching algorithm and simple matching algorithm efficiency comparison, an extreme example is: in s= "aaaaaa ... AAB "(100 a) to find the t=" Aaaaaaaaab ", the simple matching algorithm each time is compared to the end of T, found that the characters are different, and then T's subscript back to the beginning, S subscript also to backtrack the same length after 1, continue to compare. If you use the KMP matching algorithm, you do not have to backtrack. For the matching of strings in a general document, the time complexity of the simple matching algorithm can be reduced to O (m+n), so it is applied in most practical application situations. The core idea of the KMP algorithm is to make use of the partial matching information that has been obtained to carry out the matching process. Look at the previous example. Why the t[5]== ' d ' mode function is equal to 2 (next[5]=2), in fact this 2 means that t[5]== ' d ' is preceded by 2 characters and two characters that begin with, and t[5]== ' d ' is not equal to the third character after two characters (t[2]= ' C ') . In the figure: that is, if the third character after the first two characters is also ' d ', then, although the t[5]== ' d ' has 2 characters and the beginning two characters, the t[5]== ' d ' mode function value is not 2, but 0. Before I say: In the s= "Abcabcabdabba" to find T = "Abcabd", if the KMP matching algorithm, when the first search to s[5] and t[5] unequal, the S subscript is not back to 1, T subscript also not backward to start, but according to T in the t[5] = = = ' d ' mode function value, direct comparison s[5] and t[2] are equal ... Why do you do that?