Because it is not a computer Major, the data structure class teacherAlgorithmToo. This algorithm is needed to achieve a question today, so I will learn it well today.
The simplest and most direct method for pattern matching is the BF algorithm. I can personally understand it as brutal force. The brute force mode matches the mode one by one based on each location of the Main string. Because the pointer I of the primary string may need backtracking, the time complexity of this algorithm in the worst case can reach O (M * n ). Where, N and m are the length of the master string and mode respectively.
The advantage of the KMP algorithm is to cancel the I pointer backtracking so that I can always move in only one direction. Assume that the main string and mode are represented in arrays as follows:
S [0], s [1], s [2],… , S [n-1]
T [0], t [1], t [2],… , S [s-1]
The problem to be solved now is that when T [0, 1, 2 ,..., J-1] And s [I, I + 1, I + 2 ,..., I + J-1] matching, but t [J] And s [I + J] do not match, how far the pattern "moving to the right" can "possibly" re-match. Based on the characteristics of the pattern, we can determine that in some cases of displacement, matching is impossible. For example, if the mode m is abcabc, when m is not matching the second "B" from left to right, because the first four characters have been matched, it is impossible to match the displacement of only one length or two lengths.
So how can we find the first "displacement?
We assume that the comparison should start from the K (k <j) character in mode t at this time, then one thing is certain, that is, the first K-1 character t in mode t [0, 1, 2 ,..., K-2] should be with the main string s [I + J-K + 2 ..., I + J-1, I + J] matches, and this K is the largest K that meets this condition.
All we can get: T [0, 1, 2 ,..., K-2] = s [I + J-k + 1, I + J-K + 2 ,..., I + J-2, I + J-1]
From the partially matched results, we have: T [, 2 ,..., K-2, K-1 ,..., J-1] = s [I, I + 1, I + 2 ,..., I + K-2, I + k-1 ,..., I + J-1]
For convenience, let's draw a picture:
T [0] T [1]… T [K-3] T [K-2]
|
S [0] s [1]… S [I] s [I + 1] s [I + 2]… S [I + J-k + 1] s [I + J-K + 2]… S [I + J-2] s [I + J-1] s [I + J]
| Not equal
T [0] T [1] T [2]… T [J-k + 1] T [J-K + 2]… T [J-2] T [J-1]
All of us have t [0, 1, 2 ,..., K-2] = T [J-k + 1, J-K + 2 ,..., J-2, J-1]. Therefore, our problem is actually to find a prefix of the pattern substring with the current matching length, so that the prefix is equal to the suffix of the pattern string with the same length, and the length is the longest.
To find this K, the general method is to introduce an array next. When next [J] is not matched with T [J] And s [I + J, the position of the character in mode t that needs to be reconnected with s [I + J] in the main string S.
So how to find it?
Using recursive thinking, we assume that we already know next [0], next [1], next [2],…, The value of next [J-1], evaluate next [J]. For details, seeCode
1: IntKMP (Char* STR,Char* Pat)
2:{
3:IntI, J, K;
4:Memset (fail,-1,Sizeof(Fail ));
5:I = 0; k =-1;
6:While(Pat [I])
7:{
8:If(K =-1 | Pat [I] = Pat [k])
9:{
10:I ++, K ++, fail [I] = K;
11:}
12:ElseK = fail [k];
13:}
14:I = J = 0;
15:While(STR [I] & Pat [J])
16:{
17:If(Pat [J] = STR [I]) ++ I, ++ J;
18:Else If(J = 0) ++ I;
19:ElseJ = fail [J];
20:}
21:If(Pat [J])Return-1;
22:Else ReturnI-j;
23:}