For a long time, I asked for text to match the single-mode string, and I only used KMP. Later I saw the z algorithm on CF, with a lot of people. After studying, I feel that the z algorithm is also very subtle. In previous blog post, there was also the problem of solving string matches with the z algorithm.
The z algorithm is described below.
Let's start with a word about what the z algorithm can ask for.
The input for a string s,z algorithm can be obtained for each suffix of this string with its own longest common prefix LCP.
Next, the specific content of the z algorithm is introduced.
The length of the memory string s is N.
The z algorithm needs to maintain a pair of values, recorded as left and right, and précis-writers as L and R. L and R satisfy S[l,r] are prefixed with s string. When I is 1, the violence compared s[0,n-1] and s[1,n-1] can be obtained at this time L and R, but also got z[1], namely suffix (1) and S itself LCP.
Assuming the calculation to i-1, we have obtained the current L and R, and also got the value of z[1] to z[i-1], now we need to calculate z[i] with the new L and R.
1. Assuming i>r, it means that there is no string ending after I or I, and that the string itself is a prefix of s, otherwise R should not be less than I. For this situation, it is necessary to recalculate the new L and R, to make the l=r=i, the violent comparison s with suffix (i), to get the z[i]=r-i+1=r-l+1.
2. At this time i<=r, make k=i-l, can assert Z[i]>=min (z[k],r-i+1). Because we can use L to R as a prefix to the string, I have an offset of k relative to L, because of the meaning of L and R.
If z[k]<r-i+1, then z[i] must be equal to Z[k], based on this time, s[k,k+z[k]-1] is a prefix of s[i,r], and in this case L and R do not change.
If z[k]>=r-i+1, according to the meaning of R, s[r+1]!=s[r-l+1],z[k] is greater than r-i+1 match information because s[r+1]!=s[r-l+1] and invalid, so at this time according to Z[k] can assert z[i] at least r-i+1, so L =i, calculate the new R value, and get the z[i at this time].
In the concrete implementation, the second case of the two seed situation can be normalized processing.
Give a C + + implementation code:
1 voidZ (Char*s,intn=0) {2N= (n==0)?strlen (s): N;3z[0]=N;4 intL=0, r=0;5 for(intI=1; i<n;i++) {6 if(i>r) {7L=i,r=i;8 while(R<n&&s[r-i]==s[r]) r++;9z[i]=r-l;Tenr--; One } A Else { - intk=i-l; - if(z[k]<r-i+1) thez[i]=Z[k]; - Else { -L=i; - while(R<n&&s[r-i]==s[r]) r++; +z[i]=r-l; -r--; + } A } at } -}
View Code
The z algorithm solves the single pattern string matching method very simply, makes S is the text string, T is the pattern string, constructs the new string p=t+ ' # ' +s, computes the Z-array, scans backwards from the position where s begins in P, and if Z[i]=length (s), there is a match here. Of course, you can not add ' # ', then the judgment needs to use >= instead of =.
The z algorithm of "algorithm" string matching