In general, the related algorithm of string, is a few basic algorithms: Assignment strcpy, long strlen, join strcat, compare strcmp and seek substring substr. These 5 operations are relatively simple, constituting the minimum set of operations for the string, and other algorithms can be implemented by these algorithms. But in practical applications, pattern matching index is a very extensive string operation, and we prefer not to rely on other actions to implement it.

General match

For example, the most straightforward way to find the pattern string T in the target string s is:

1. Use I, J, respectively, to point to a string

2. Traverse, S[i]==t[j] i++, j + +

3.S[I]!=T[J] I go back to the starting point of the target string S, and J goes back to the start of the T string

This process can easily be written as a program, as follows:

/** * Function: Find the location of the pattern string in the target string * parameter: s--target string (source string) t--pattern string (template string) pos--target string from POS to look for * return: Find successful, Return T in S first match position lookup failed, return-1 * Other: 2015/01/09 by Jim Wen ver1.0**/int index (const char *s, const char *t, int pos) {int I=pos, J =0;int Slen = strlen (S), TLen = strlen (T), while (I<slen && J<tlen) {if (S[i]==t[j]) {i++;j++;} else//match unsuccessful, backtracking, re next match {j=0;i=i-j+1;}} if (J>=tlen)//Find success {return i-tlen;} else//lookup failed {return-1;}}

KMP Matching

The general match is easy to understand and easy to implement, but inefficient, the T string is too long, the back cost of S is too high, want to understand the specific reasons can find information on their own, here is not detailed. To improve efficiency, you want to reduce the number of backtracking, the following processing:

It can be seen that 1, 2 walk and the general match the same, the difference is the 3rd foot, when S and T mismatch, I did not backtrack, this time J back to a. Why do you do it, pay attention!!!

For T=abcabcdef, after the mismatch in D must be sure that the s corresponding to a sequence of characters before a must be abcabc, that is, whatever s is, as long as T matches to D, then you can confirm the s corresponding point before the character sequence, further meaning that whatever s is, As long as T is mismatch in D, then it is certain that the next step in the T itself should be compared with the character of the I at the current s, and it is only the string t that determines the backtracking position of J after mismatch.

We can describe this relationship as J=next[j], which is the position of the next J after mismatch in J.

Then the entire matching process is as follows:

1. Use I, J, respectively, to point to a string

2. Traverse, S[i]==t[j] i++, j + +

3.S[I]!=T[J], then T-slide to the right to J=next[j], continue the comparison of S[i] and s[j], if not equal to continue T-slide to the right to J=next[j], continue s[i] and s[j] comparison, so loop

4. Once T is too strong to slide to the right (i.e. j=0 and s[i] do not match, J can only =-1), then i++, j + +

Then the entire process is written as follows (note that procedures 2 and 4 are merged),

/** * Function: Find the location of the pattern string in the target string * parameter: s--target string (source string) t--pattern string (template string) pos--target string from POS to look for * return: Find successful, Return T in S first match position lookup failed, return-1 * Other: 2015/01/09 by Jim Wen ver1.0**/int kmp_index (const char *s, const char *t, int pos) {int I=po S, j=0;int Slen = strlen (s), tLen = strlen (t), int *next = new Int[tlen];get_next (T, next); while (I<slen && j< ; tLen) {if (J==-1 | | S[I]==T[J]//Note the case of j==-1 {i++;j++;} else//match unsuccessful, backtracking, re next match {j = next[j];}} Delete[] Next;if (j>=tlen)//Find success {return i-tlen;} else//lookup failed {return-1;}}

Find the next array of KMP pattern matching

Such as:

Assuming T[j] and s[i] mismatch, the next should be compared is t[k] and s[i], then there must be

And by a known matching relationship.

Then for T there must be:

As you can see, therule of J move is to find a sequence of characters starting from T1 and the same sequence as the previous one in TJ.

Here, we find the internal relationship of T, and the next array is only computed with T:

How do you calculate the next array? use T as the target string, and T as the pattern string , such as:

T-1 is introduced here, which means that j is sliding to the head, so the algorithm for the next array and pattern matching is very similar , as follows:

1. Initial i=0,j=-1,next[i]=j, that is, next[0]=-1, is when t[0] and the main string s does not match when the current s[i] position is not matched

2. If T[I]==T[J],

At this time implied t[i-1]=t[j-1],t[i-2]=t[j-2] ...

Then there is

Of course there are next[i+1]=j+1 .

3. If t[i]!=t[j], note that at this time the implied

So in order to find the same elements as TI, swipe right to the next satisfying condition J

J=NEXT[J]

4. Swipe right over, J=-1, next[i+1]=j+1=0

5. Supplement in 2, if t[i+1]==t[j+1], then

S and t[i+1] mismatch, according to 2 have S and t[next[i+1]]=t[j+1] match, obviously is also mismatch, this time must look down again, that is next[i+1]=next[j+1]

So the actual procedure for next is the following (merging 2, 4, 5), is it similar to the pattern matching program?

void Get_next (const char *t, int *next) {int i=0, j=-1;int tLen = strlen (T); Next[0]=-1;while (I < tLen-1) {if (J==-1 | | T[i]==t[j]) {i++;j++;if (T[i]!=t[j]) {next[i] = j;} Else{next[i] = Next[j];}} Else{j = Next[j];}} return;}

Full Source code Download link

original, reprint please indicate from http://blog.csdn.net/wenzhou1219

13. String-Pattern matching