BF and KMP Algorithms for string mode matching
This article briefly describes the string pattern matching. This article mainly describes BF algorithm and KMP algorithm. We strive to make it clear and concise.
BF algorithm
The core idea is: for the main string s and the mode string t, the length is len1, len2, traverse the main string s in sequence, that is, whether the first len2 characters starting from the position 0 are equal to the corresponding characters of t. If they are completely equal, the match is successful. Otherwise, starting from the next position 1, compare whether the len2 characters starting from 1 are equal to the corresponding characters of t ....
BF algorithms are clear and simple, but tracing is required for each unsuccessful matching.
The following code is directly pasted:
Int BF_Match (char * s, char * t) {int I = 0, j = 0; int k; while (I
= StringLength (t) k = I-StringLength (t) + 1; // If the match succeeds, the matching position elsek =-1; return k;} is returned ;}
2. KMP Algorithm
It is an improved BF algorithm. This mainly eliminates the backtracking of the primary string pointer I and uses some of the obtained matching results to slide right as far as possible to continue the comparison. Change the time complexity of the algorithm from bf o (len1 * len2) to O (len1 + len2 ). in the kmp algorithm, string pointer I does not backtrack, and the mode string pointer j is returned to a position k, equal the number of K-1 characters before k in the mode string t to the K-1 characters before I in s, reducing the number of matches
To improve efficiency.
When it is s [I]! = T [j] indicates that the primary string s is s [I-j + 1] -----> s [I] element and pattern string t [0] -----> t [J-1] elements correspond to equal; in this case, to shift the Pointer right as much as possible, we should find from the main string I-j + 1 to the I-1 substring from the last forward
A maximum substring is exactly matched. It is also equivalent to finding the maximum k value in the t [0] -----> t [J-1] element of the pattern string so that the first k elements correspond to the last k elements.
Next [] array is used to save the start position of the mode string for each comparison.
The following describes the main string "ababcacabcabbab" and the pattern string "abcab ".
At the beginning, the comparison starts from the first element. If all t elements match, the matching is successful. Otherwise, s [I]! = T [j], if j = 0 next comparison position next [j] =-1, that is, comparison starting from t [0] And s [I + 1.
If k = next [j] is not equal to 0 and belongs to case1, the next comparison starts from t [k] And s [I]. If k = 0, case2, in the next comparison, t [0] And s [I] are not optimized.
In short, the pointer I of s has been increasing during the comparison process. The pattern string pointer j can be optimized based on the last partial matching result.
The following code is directly pasted:
// The KMP algorithm uses some matching results to shift the right of the pattern string as far as possible, reducing the number of comparisons. // GetNext is used to save the position void GetNext (char * t, int * next) {int j = 0, k =-1; next [0] =-1; while (j
= StringLength (t) k = I-StringLength (t) + 1; // If the match is successful, the matching position elsek =-1; return k;} void main () is returned () {char * a = "ababcacabcabbab"; char * B = "abcab"; int next2 [10]; int I; printf ("BF matching position: % d ", BF_Match (a, B); printf ("KMP matching position: % d", KMP_Match (a, B); GetNext (B, next2 ); printf ("\ nnext [] array:"); for (I = 0; I <5; I ++) printf ("% d ", next2 [I]);}