The KMP algorithm is based onAnalysis Mode stringTo pre-calculate the next comparison position of goto when the position does not match, sort out a next array and use it in the preceding algorithm.
This global match KMP algorithm is used to store the string's heap data structure.
# Define maxsize 45 // fixed the length of the next array # define OK 1 # define error 0 typedef int status; // return status // the position where the matching string is stored. Int indexarray [maxsize] = {0}; // the number of times that the matching string appears. Int searchindex = 0; // ------------------ heap allocation storage of the string indicates typedef struct {char * Ch; // pointer field, pointing to the base address of the bucket where the string value is stored, int length; // integer field: storage String Length} hstring;
The value of the next function depends only on the pattern string and is irrelevant to the matched primary string. Therefore, starting from the analyzer definition, the value of the next function is obtained using a recursive method.
/* Function: Obtain the next function. */void get_next (hstring * t, int * Next) {int I = 1, j = 0; * (next + 1) = 0; while (I <t-> length) {If (j = 0 | * (t-> CH + i-1) = * (t-> CH + J-1 )) {I ++; j ++; * (next + I) = J;} else J = * (next + J );}}
When a "mismatch" occurs during the matching process, the pointer I remains unchanged, and the pointer J returns to the next [J] Knowledge position to repeat the row comparison, and when the pointer J returns to 0, the pointer ID and pointer J must be added with 1 at the same time. That is, if the I character in the main string is different from the 1st character in the mode, a new match should be initiated from the I + 1 character in the main string.
When the number of equal characters in the matched records is greater than or equal to the length of the pattern string, the current position is recorded, that is, the appropriate position is matched, and then the next group is matched.
The time complexity for obtaining the next function algorithm is O (M). Generally, the length m of the mode string is much smaller than the length N of the Main string. Therefore, it is worthwhile for the entire matching algorithm.
In general, the time complexity of only one pattern match is O (m + n). Therefore, the time complexity of global pattern match is related to the number of successful matches. It is O (m + n) * B (B times ).
The biggest feature of the kmp algorithm is that the pointer of the primary string does not need to be traced back. During the entire matching process, the primary string only needs to be scanned from the beginning to the end.
/* Function: The KMP algorithm starts from the POs and obtains the t position of all matching strings in string s, which is stored in the global function indexarray.
Initial Condition: t is not null, And I <= POS <= s-> length */status index_kmp (hstring * s, hstring * t, int POS) {// process illegal Input
If (! T | POS> 1 | POS> S-> length)
Return Error; // clear for (int K = 0; k <maxsize; k ++) {indexarray [k] = 0;} searchindex = 0; // obtain the next array Int J, next [maxsize] ={}; get_next (S, next); If (Pos <0 | POS> S-> length) exit (0); int slength = s-> length; int tlength = T-> length; int I = pos-1; j = 0; while (I <= slength & J <= tlength) {If (j = 0 | * (S-> CH + I) = * (t-> CH + J) {++ I; ++ J;If (j> = tlength){Indexarray [searchindex] = I-tlength + 1; searchindex ++ ;}} else {J = * (next + J );}}
Return OK ;}
The highlighted area should beIf (j = tlength). If the value is greater than or equal to, when a matched string is displayed at the last position of the Main string, that is, if S1 in the following test code is assigned abaabc, after the program assigns a value to the value, I continues to increase itself, the final output result is 1 9 17 25 26. ThereforeIf (j> = tlength) is changedIf (j = tlength)
The test is as follows:
Hstring S1; inithstring (& S1); assighstring (& S1, "abaabc"); hstring s; inithstring (& S); assighstring (& S, "abaabcd abaabcf abaabcj abaabck"); index_kmp (& S, & S1, 1); printf ("occurrences of S1 in S: % d \ n", searchindex ); for (INT I = 0; I <searchindex; I ++) printf ("% d", indexarray [I]); printf ("\ n ");
Result: 1 9 17 25
Conclusion: Careful debugging and enhanced code robustness.