KMP algorithm detailed 1, violent matching (BF algorithm)
Suppose there is a text string s, and a pattern string p, to find the position of P in s?
With a violent matching mentality, assuming that the text S string matches to the I position, the pattern string P matches to the J position, then:
- If s[i] = = P[j], then i++,j++, continue to match;
- If S[i]!=p[j], then i = i-(j-1), that is, the matching failure i backtracking, J reset to 0.
The code is as follows:
intViolentmatch (Char* S,Char*p) {intSlen =strlen (s); intPlen =strlen (P); inti =0; intj =0; while(I < Slen && J <Plen) { if(S[i] = =P[j]) { //if the current character matches successfully (that is, s[i] = = P[j]), then i++,j++i++; J++; } Else { //if mismatch (i.e. s[i]! = P[j]), make i = i-(j-1), j = 0i = I-j +1; J=0; } } //The match succeeds, returns the position of the pattern string p in the text string s, otherwise returns-1 if(J = =Plen)returnIJ; Else return-1; }
2, KMP algorithm 2.1, KMB algorithm ideas and code
The KMP algorithm is used to solve the problem, assuming that the text string s matches to the I position, the pattern string P matches to the J position;
- If j = =-1, or s[i] = = P[j], then i++,j++, continue to match;
- If j!=-1, and S[i]!=p[j], then i = i-(j-1), then I do not change, j = next[j]. That is, when a match fails, the number of bits that the pattern string moves to the right is: The mismatch character's location-the next value corresponding to the mismatch character (the actual number of moves: J-next[j])
The code is as follows:
intKmpmatch (Char* S,Char*p) {inti =0; intj =0; intSlen =strlen (s); intPlen =strlen (P); while(I < Slen && J <Plen) { //If J =-1, or if the current character matches successfully (that is, s[i] = = P[j]), make i++,j++ if(j = =-1|| S[i] = =P[j]) {i++; J++; } Else { //if J! =-1, and the current character match fails (that is, s[i]! = P[j]), then I is unchanged, j = Next[j]//That is, the character mismatch, the pattern string jumps to next [j] Positionj =Next[j]; } } if(J = =Plen)returnIJ; Else return-1; }
2.2next Array Derivation
Next array the meanings of each value: the same prefix suffix that represents the length of the string that precedes the current character. That is, next [j] = k, the string representing the preceding J has the same prefix suffix with a maximum length of K .
- Prefix suffix longest common element length
- prefix max = suffix max, length at this time
- For example, the pattern string is abcabd, and the prefix suffix is the largest common element table, as follows:
For string abcab, it has the same prefix suffix ab with a maximum length of 2.
- The next array is derived from the longest common element length of the prefix suffix.
- That is, the prefix suffix of the string preceded by the J pointer is the longest of the common element length = Next[j]
- For example, the pattern string is the Abcabd,next array value, as follows:
For the character d, the preceding string is Abcab, and for string Abcab it has the same prefix suffix ab with a maximum length of 2, so d corresponds to the next value of 2.
Comparison of the above two tables
It can be found that thenext value is the value of the longest common element length of the prefix suffix is shifted one bit to the right, then the initial values are assigned to-1.
2.3 Match according to next array
Next Array function: When a character in a pattern string matches a character in a text string mismatch, tell the pattern string where to jump next. For example, when the character at J in the pattern string matches the character mismatch in the text string at I, the next character at Next [J] continues to match the character at the text string i.
- Matching of maximum common element length value based on prefix suffix
- Mismatch, the number of bits that the pattern string moves to the right is: matched number of characters-maximum length value for the previous character of the mismatch character
- Match based on next value
- Mismatch, the number of bits to the right of the pattern string is: where the mismatch character is located-the next value corresponding to the mismatch character
Briefly summarizing, when counting from 0, the position of the mismatch character = the number of characters that have been matched, and the next value of the mismatch character = The maximum length of the previous character of the mismatch character.
Code derivation for 2.4 next array
Can be deduced according to the mathematical inductive method:
- Initialize Next[0] =-1;
- For value K, there are p0 P1, ..., pk-1 = Pj-k pj-k+1, ..., pj-1, equivalent to next[j] = k, that is, the pattern string substring preceding p[j], with the same prefix and suffix of length k.
- Solve Next[j + 1] =? There are two kinds of situations to discuss:
-
- if p[k] = = P[j], then next[j + 1] = next [j] + 1 = k + 1 ;
- If P[K]! = P[j], the pattern string substring preceding p[j + 1] does not have the same prefix suffix with the length k + 1 and can only be searched for the same prefix suffix with a shorter length, i.e., k = next[k] recursively;
- If p[k] = = p[j],next[j + 1] = k +1 (description, at this point the value of K has been updated, that is, k = next[k]);
- If p[k]! = P[j], then recursive .
The code is as follows:
voidGetNext (Char*p,int* &next) {Next= (int*)malloc(Strlen (P) *sizeof(int)); if(!next) Exit (-1); intj =0; intK =-1; next[0] = -1; while(J < strlen (p)-1{//p[k] represents a prefix, p[j] represents a suffixif(k = =-1|| P[J] = =P[k]) {next[++J] = + +K; }Else{k=Next[k]; } }}
Optimization of 2.5 Next array
Based on the code derivation of the next array, we will find that the code is not optimized for p[next[j]] = = P[j] .
- NEXT[J] = k;
- P[NEXT[J]] = = P[j] (p[k] ==p[j])------------>next[j + 1] = k +1; (original next function processing mode)
- P[NEXT[J]] = = P[j] (p[k] ==p[j])------------>next[j] = next[j] = Next[next[j]] (k = next[k]); (optimized post-processing mode)
For example, if you use the previous next array method to find the next array of "ABAB" of the mode string, the next array is 1 0 0 1 (0 0 1 2 overall right one bit, the initial value is assigned to-1), when it matches the text string to match, found B and C mismatch, then the pattern string sliding to next[3] = 1 Position, that is, move right j-next[j] = 3-1 = 2 bits.
After moving the 2-bit right, B is also mismatch with C. In fact, because in the previous step of the match, it has been learned that p[3] = B, and s[3] = C mismatch, and the right to move two bits, let p[next[3]] = p[1] = B and then s[3] match, inevitably mismatch.
So should not appear p[next[j]] = = P[j], if it appears, you need to recursion again, even if next[j] = next[Next[j]].
The next array of "ABAB" of the next array is optimized, and the next array can be 1 0-1 1.
As in the example above, s[3] and P[3] match failed, s[3] remains the same, P's next match position is p[next[3]], and next[3]=0, so p[next[3]]=p[0] and s[3] match,
Match success
The optimization code is as follows:
voidGetNext (Char*p,int* &next) {Next= (int*)malloc(Strlen (P) *sizeof(int)); if(!next) Exit (-1); intj =0; intK =-1; next[0] = -1; while(J < strlen (p)-1){ if(k = =-1|| P[J] = =P[k]) { if(P[++j] = = p[++K]) {// cannot appear p[j] = = P[next[j]], if present, need to continue recursion Next[j]=Next[k]; }Else{Next[j]=K; } }Else{k=Next[k]; } }}
Complete code is included:
#include <stdio.h>#include<stdlib.h>#include<string.h>intKmpmatch (Char* S,Char* p,int*next);voidGetNext (Char* p,int* &next);intMain () {int*Next; Charstrtxt[ $]; printf ("Please enter your body: \ n"); Gets (Strtxt); Charstrkey[ -]; printf ("Please enter the substring you are looking for: \ n"); Gets (strkey); GetNext (strkey, next); printf ("Next evaluation result of substring: \ n"); for(unsignedinti =0; I <= strlen (strkey)-1; ++i) {printf ("%5c", Strkey[i]); } printf ("\ n"); for(unsignedinti =0; I <= strlen (strkey)-1; ++i) {printf ("%5d", Next[i]); } printf ("\ n"); intpos =Kmpmatch (Strtxt, strkey, next); if(POS! =-1) printf ("character match, match point is%d\n", POS); Elseprintf ("character mismatch! \ n"); System ("Pause"); return 0;}intKmpmatch (Char* S,Char* p,int*next) { inti =0; intj =0; intSlen =strlen (s); intPlen =strlen (P); while(I < Slen && J <Plen) { if(j = =-1|| S[i] = =P[j]) {i++; J++; }Else{J=Next[j]; } } if(j = = Plen)returnIPlen; Else return-1;}voidGetNext (Char*p,int* &next) {Next= (int*)malloc(Strlen (P) *sizeof(int)); if(!next) Exit (-1); intj =0; intK =-1; next[0] = -1; while(J < strlen (p)-1){ if(k = =-1|| P[J] = =P[k]) { if(P[++j] = = p[++K]) {Next[j]=Next[k]; }Else{Next[j]=K; } }Else{k=Next[k]; } } }
KMP algorithm Detailed