First, the simple mode
Suppose we want to find the substring t= "Google" from the main string s= "Goodgoogle" in the following steps:
I represents the current position subscript of the main string, J indicates the current position subscript of the substring, such as the position of j=4 and i=4 in the first-round comparison (I=1), followed by a pointer fallback, compared from i=2, as follows:
So repeated until compared to the position of I = (main string length-substring length + 1) or j = substring length exits the comparison loop, and the above main string and substring match exactly where the i=5 is compared.
#include <stdio.h>int Index (const char s[], const char t[], int pos) {int i = POS;//main string current subscript int J = 1;//substring current subscript//s[0] Yes The length of the main string, t[0] is the length of the substring while (i <= s[0] && J <= T[0]) {///if equal then continue down comparison if (s[i] = = T[j]) {++i;++j;//If not equal then the pointer is back}else{i = i -J + 2; Main string Fallback j = 1;}} J>t[0] Indicates that the substring is fully matched if (J > t[0]) return i-t[0];else return 0;} int main () {char s[] = {ten, ' g ', ' o ', ' o ', ' d ', ' g ', ' o ', ' o ', ' g ', ' l ', ' e '};char t[] = {6, ' g ', ' o ', ' o ', ' g ', ' l ', ' e '};i NT I =index (S, T, 1);p rintf ("%d\n", I); return 0;}
Analysis of the best case of this matching algorithm time complexity is O (1) Only need a comparison, the worst case is each time the last character mismatch, time complexity is O (m*n) m is the main string length n is the length of the substring.
Second, KMP algorithm
As with multiple 0 and 1 repeating strings like binary, the above pattern matching needs to be traversed very slowly, and the KMP algorithm can greatly avoid repeated traversal situations.
Let's take a look at the fundamentals of the KMP algorithm
As above, you can see that the main string s and the substring T in the first round of comparison, the front 5 is equal, only the position of i=6 and j=6. Since the 5 characters of the ABCDE in the substring T are not equal to each other, it is possible to know that a in the substring T is impossible and the position of j=2, 3, 4, 5 and so on typeface. So you can jump directly to the location of the i=6 to compare.
Again, if there are duplicate elements in the substring T (such as j=1,2 and j=4,5 characters), according to the above analysis, we can jump directly to the i=4 position comparison, but we already know j=1,2 and j=4,5 equal, and i=4,5 and j=4,5 equal, so can not compare i= 4,5 and j=1,2.
KMP pattern matching algorithm is to not let I pointer back, since I value does not fall back, we have to consider the value of change J. The above observation shows that the change of J value has nothing to do with the main string, but depends on whether there is a repetition problem in the substring T.
We define the change of J value for each position of T string as an array next, then the length of next is the length of the T string, and the following function can be obtained:
#include <stdio.h>void get_next (const char t[], int *next) {int i,j;i = 1;j = 0;next[1] = 0;//t[0] is the length of the substring T while (i < T[0]) {//t[i] represents a single character of the suffix//t[j] the single character of the prefix if (j==0 | | T[i] = = T[j]) {++i;++j;next[i] = j;} Else{j = Next[j];}} int INDEX_KMP (const char s[], const char t[], int pos) {int i = pos;int j = 1;int Next[255];get_next (T, next); while (I <= S[0] && J <= T[0]) {//relative to the naïve algorithm, adds a j==0 judgment if (j==0 | | S[i] = = T[j]) {++i;++j;} Else{//j fallback to the appropriate position, the value of I is unchanged j = Next[j];}} if (J>t[0]) {return i-t[0];} Else{return 0;}} int main () {return 0;}
Data structure--simple pattern of string and KMP matching algorithm