Character matching is inevitable during string processing. Common Character matching methods
1. Simple pattern matching algorithm (brute-force algorithm)
Index (S, T, POS) of the position of the substring ).
Pattern Matching: a substring is usually called a pattern matching. Target string: Main string S. Mode string: substring T. Matching successful: if each character with T is the same as a continuous Character Sequence in S, the matching is successful. Returns the position of the first character in T in S. Matched: 0 is returned. Lbrute-force is short for BF algorithm, also known as simple matching algorithm. Its basic idea is:
From the target string S = "s1s2... The first character of the SN starts with the pattern string T = "t1t2... The first character comparison in TM. If they are equal, the subsequent characters are compared one by one;
Otherwise, start from the second character of the target string S and compare it with the first character of the pattern string t again.
And so on. If, starting from the I character of the mode string S, each character is equal to the corresponding character in the target string t, the match is successful, and the algorithm returns I; otherwise, matching failed. The function returns 0.
2. Improved pattern matching algorithm-KMP Algorithm
KMP is a KMP algorithm jointly proposed by D. E. knuth, J. H. Morris, and V. R. Pratt. Compared with BF, this algorithm greatly improves the efficiency of the algorithm by eliminating the backtracking of the master string pointer.
When character comparison is not equal during each match, the primary pointer I is not traced back. The obtained "partial match" result is used to slide the pattern to the right as far as possible and continue the comparison.
Define the next [J] function, indicating that when the corresponding character "mismatch" between the J character in the mode and the main string, the position of the character that needs to be compared with the character in the main string in the mode. (For details, see data structure (Yan Weimin Edition ))
Next Function Definition:
The following is an implementation:
The function used to obtain the next array is slightly different from the textbook description. The original text uses the first value of the string to indicate the size of the string. the actual content of the string starts from the second character and is inconsistent with the actual content. This article will change it. The meaning of the value of the next array is changed. If the value of next is-1, the matching is invalid. You need to change the Array (I + 1) compared to the textbook, all next values are reduced by one, and the meaning remains unchanged.
1 #include <cstdio> 2 #include <string> 3 using namespace std; 4 5 void get_next(string p, int* next) 6 { 7 int sp = p.size(); 8 next[0]=-1; 9 10 int i,j;11 i=0;j=-1;12 13 while(i<sp-1)14 {15 if(j==-1||p[i]==p[j])16 {17 ++i;++j;18 if(p[i]!=p[j])19 next[i]=j;20 else21 next[i]= next[j];22 }23 else24 {25 j=next[j];26 }27 }28 }29 void printNext(int* next,int n)30 {31 for(int i =0; i<n;i++)32 printf("%d ",next[i]);33 printf("\n");34 }35 int kmp_search(string s, string pattern,int pos)36 {37 int sizeP = pattern.size();38 int sizeS = s.size();39 40 int *next = new int[sizeP];41 memset(next,0,sizeof(int)*sizeP);42 43 get_next(pattern,next);44 printNext(next,sizeP);45 46 int i,j;47 i=0;j=0;48 49 while(i<sizeS&&j<sizeP)50 {51 if(j==-1||s[i]==pattern[j])52 {53 ++i;++j;54 }55 else56 {57 j=next[j];58 }59 }60 61 delete next;62 63 if(j==sizeP)64 {65 return i-sizeP;66 }67 else 68 return -1;69 70 }71 int main()72 {73 string s = "abacaesabacadfabacawersdf";74 string pat = "abacaw";75 int result = kmp_search(s,pat,0);76 printf("s: %s\tt: %s\npos: %d\n",s.c_str(),pat.c_str(),result);77 return 0;78 }