In my previous article "BM algorithm details", there was a huge defect that I failed to provide an efficient algorithm to jump tables with suffixes in the computing mode. Robert S. Boyer and J
In the essay by Strother Moore, I did not give such an algorithm for some reason. the time complexity of the barbaric algorithm O (N ^ 3) greatly compromises the practicability of the BM algorithm. In fact, there is an algorithm for calculating the suffix jump table of a pattern string in linear time, But before introducing this algorithm, I would like to recommend an authoritative book on string processing, algorithms on strings, trees and sequences, by Dan gusfield. The book covers almost all of today's string processing technologies with practical value. Of course, the BM and KMP Algorithms also cover these technologies. The content of this article is derived from this book. However, the content of this book can be said to be very difficult, and it is very difficult to thoroughly understand it.
In my two articles on KMP and BM algorithms, I have mentioned a key issue, that is, selfinclusion of the Front/suffix. Both the KMP algorithm and the BM algorithm jump table are directly related to the selfcontained prefix/suffix. Here we need to introduce a concept Zi (s), where S represents the mode string, for the mode string s [1... n], Zi (s) indicates the substring s [I... j], where J is the length of all... j] = s [1... in JI + 1. It is quite mysterious. Actually, it is the longest prefix starting with "I. For S = aabcaabxaaz, we have
 Z5 (S) = 3, (AAB) C (AAB) xaaz
 Z6 (S) = 1, (a) ABCA (a) Baaz
 Z7 (S) = Z8 (S) = 0, when s [I]! = S [1], Zi (S) = 0
 Z9 (S) = 2, (aa) bcaabx (AA) Z
We know from z5 (S) = 3 above that s [5... 7] = s [1... 3], and s [5... 8]! = S [1... 4], here we put s [5... 7] a zblock of string S. For Zi (s), if Zi (s )! = 0, then the marked Zblock starts with I and ends with I + Zi (S)1. Obviously, a string may contain several Zblocks, and the Zblocks may overlap each other. We then define two values, Li and RI. Li and RI are the largest right endpoint in all Zblocks containing S [I], as shown in, here, there are two zblocks that contain I. Only the L value and R value of Zblock marked with a are the actual values of Li and RI. In fact, s [li... Ri] = s [1... RiLi + 1].
Now let's introduce it to you, In Z1 (s ),......, If Zi (s), Li, and RI are known, how can we solve Zi + 1 (s)? Here we set li = L, rI = r, I + 1 = K, iLi + 2 = K '.
1. if the Zblock relationship between K, ZK '(s) and L, R is shown in, because s [l... r] = s [1... rl + 1], so we can put s [l... r] The problem in the interval is put in S [1... in the Rl + 1] range, K' is the corresponding vertex of K in the range of 1 and Rl + 1 '. We need to pay attention to the known amount ZK '(s). In this case, ZK' (s) determines that Zblock is completely included in 1, within the Rl + 1 range. That is, k' + ZK '(S)1 <Rl + 1. ZK (s) is actually ZK' (s ).
2. If the relationship between K, ZK '(s) and Zblock determined by L, R is shown in. At this point, we also put the problems in the S [L... R] range into the s [1... Rl + 1] range for analysis. At this time, ZK '(s) determines that the right end of the Zblock must exceed Rl + 1, that is, for ZK (s ), we already know the former rk + 1 element and S [1... rk + 1] is the same, however, whether the elements after s [R] can be connected with the preceding rk + 1 element to form a longer include prefix can only be known after comparison. Because we already have s [k... r] = s [K '... rl + 1] = s [1... rk + 1] (Note the several regions marked with beta in the figure), so we can skip the comparison of these two intervals, start directly from S [rk + 2] and compare with s [rk + 2] until the matching fails. Then we get the new right endpoint RI + 1, at the same time, update Li + 1 to I + 1.
3.
If R is <= K. The previously calculated Zblock does not help us. We can find the smallest k starting from r so that s [R... K]! = S [1... rk + 1]. At this time we also need to update the corresponding Li + 1 = I + 1, RI + 1 = K1.
After processing the preceding three cases, we can recursively fill in all the Zi (s) values of S [1... n] in the linear time. Assume that the mode string S = "aabaabcaxaabaabcy", the corresponding Zi (s) value is as follows.

1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
S 
A 
A 
B 
A 
A 
B 
C 
A 
X 
A 
A 
B 
A 
A 
B 
C 
Y 
Zi (s) 
0 
1 
0 
3 
1 
0 
0 
1 
0 
6 
1 
0 
3 
1 
0 
0 
0 
When z12 (s) is to be calculated, all Z1 (s) to z11 (s) have been calculated. At this time, L = 10, r = 15, that is, s [10... 15] The resulting Zblock is the rightmost current Zblock and contains s [12]. Now we need to calculate z12 (s), because s [10... 15] = s [1... 6], so z12 (s) is closely related to Z3 (s). We found that Z3 (S) = 3 + Z3 (S) = 3 <6, this is in line with the first case, so z12 (S) = Z3 (S) = 0.
For Z10 (s), when Z10 (s) is calculated, it is known that the rightmost Zblock is s [8], L = 8, r = 8, because 10> 8, so in line with the third case above, we will look for the containing prefix of S from S [10] And find s [10... 15] is a prefix of 6 s, so Z10 (S) = 6, update L = 10, r = 15 at the same time.
In the Zi (s) value calculation, the scenario in the second case is rare, but the second case is also the most vulnerable part in the Zi (s) calculation.
The following is a selfwritten algorithm used to calculate the Z array.
void ZBlock(const char* pattern, unsigned int length, unsigned int zvalues[]){unsigned int i, j, k;unsigned int l, r;l = r = 0;zvalues[0] = 0;for(i = 1; i < length; ++i){if(i >= r){j = 0;k = i;zvalues[i] = 0;while(k < length && pattern[j] == pattern[k]){++j;++k;}if(k != i){l = i;r = k  1;zvalues[i] = k  i;}}else{if(zvalues[i  l] >= r  i + 1){j = r  i + 1;k = r + 1;while(k < length && pattern[j] == pattern[k]){++j;++k;}l = i;r = k  1;zvalues[i] = k  i;}else{zvalues[i] = zvalues[i  l];}}}}
Because the normal string starts from index 0, this is adjusted in the algorithm.
In theory, the Zblock algorithm completely solves the problem of prefix selfcontained computation, the Zblock algorithm is superior to the KMP algorithm in describing the next table construction process. With the Zvalue array of the mode string, the next hop table of the corresponding KMP algorithm will become efficient and intuitive in the calculation of the good suffix table of the BM algorithm.