The suffix array is probably the suffix rank to do some things, because each substring in the string can be regarded as a prefix of a suffix
The suffix ranking can be obtained by multiplication method
First, array meaning (for string s)
Sa[i]: Position of the beginning of the suffix of I in S
Height[i]: LCP (longest common prefix) with a suffix of I and a suffix of i-1
C[]: Used for cardinal sort, statistic prefix and
Rank[i]: The ranking of suffixes starting with s[i] is clearly rank[sa[i]]=i sa[rank[i]]=i
Second, seeking sa[] concrete ideas
1. Use multiplication method to construct the first and second keywords, the first keyword small row in front, the first keyword the same second keyword small row in front.
2. Optimization: If the suffix length is enumerated to a large hour, the rank of each suffix is different from each other, then you can exit directly, it is obvious
Third, seeking height[] concrete ideas
First find out Rank[i]
See height[], known as suffix (sa[rank[i-1]) and suffix (i) LCP
As soon as you find the beginning of suffix (sa[rank[i-1]), the brute force enumeration
Here is a small optimization (see http://www.cnblogs.com/LLGemini/p/4771235.html for details)
H[] is height[]
For I>1 and rank[i]>1, there must be h[i]≥h[i-1]-1. (This nature should be well understood!) )
Proof: Set suffix (k) is the suffix of suffix (i-1), and their longest common prefix is h[i-1].
Then suffix (k+1) will be in front of suffix (i) (where h[i-1]>1 is required, if h[i-1]≤1, the primitive is clearly established) and suffix (k+1) and suffix (i) the longest common prefix is h[i-1]-1,
So suffix (i) and the longest common prefix in its previous suffix are at least h[i-1]-1.
calculated in order of h[1],h[2],......, H[n], and using the properties of H array, the time complexity can be reduced to O (n).
Iv. Matters of note
Build sa[], pass 4 parameters into the function, set the original string is s, set S length is n,s the size of the largest character m
Build_sa (S[],SA[],N+1,M)
The reason to pass n+1 instead of n is to fill a "0" at the end of S.
Cause: Prevent array from overstepping
for (int i=1;i<n;i++) x[sa[i]]= y[sa[i]]==y[sa[i-1]]&&y[sa[i]+k]==y[sa[i-1]+k]?p-1:p++;
if &NBSP;X[IDX1]==X[IDX2] (note idx1!=idx2), description to idx1 or idx2
begins with a length of len
string certainly does not include the character x [n-1]
&NBSP;, so call the variable sa [idx1+len]
and SA [idx2+len]
does not cause the array to go out of bounds, so no special judgment is required.
The perfect solution!
Code:
1 intsa[n],rk[n],h[n],c[n],r[n],wa[n],wb[n],sp[n],n,k;2 3 voidGet_sa (int(RNint*sa,intNintm) {4 int*X=WA,*Y=WB;//are all auxiliary variables5 for(intI=0; i<n;i++) c[x[i]=r[i]]++;6 for(intI=1; i<m;i++) c[i]+=c[i-1];7 for(inti=n-1; i>=0; i--) sa[--c[x[i]]]=i;8 for(intk=1; k<=n;k<<=1){9 intp=0;Ten for(inti=n-k;i<n;i++) y[p++]=i; One for(intI=0; i<n;i++)if(sa[i]>=k) y[p++]=sa[i]-K; A - for(intI=0; i<m;i++) c[i]=0; - for(intI=0; i<n;i++) c[x[i]]++; the for(intI=1; i<m;i++) c[i]+=c[i-1]; - for(inti=n-1; ~i;i--) sa[--c[x[y[i]]]]=Y[i]; - -Swap (x, y);//x, Y is a pointer, directly interchangeable +p=1; x[sa[0]]=0; - for(intI=1; i<n;i++) x[sa[i]]= y[sa[i]]==y[sa[i-1]]&&y[sa[i]+k]==y[sa[i-1]+k]?p-1:p + +; + if(p>=n) Break; AM=p;//optimization: Up to P elements, the next maximum value is P at } - } - - voidGet_h () { - intk=0, mh=-1; - for(intI=0; i<n;i++) rk[sa[i]]=i; in for(intI=0; i<n;i++){ - if(k) k--; to intj=sa[rk[i]-1]; + while(R[i+k]==r[j+k]) k++; -h[rk[i]]=K; the } *}
View Code
2017-06-02 20:58:43
A little idea about the suffix array