A little idea about the suffix array

Source: Internet
Author: User

The suffix array is probably the suffix rank to do some things, because each substring in the string can be regarded as a prefix of a suffix

The suffix ranking can be obtained by multiplication method

First, array meaning (for string s)

Sa[i]: Position of the beginning of the suffix of I in S

Height[i]: LCP (longest common prefix) with a suffix of I and a suffix of i-1

C[]: Used for cardinal sort, statistic prefix and

Rank[i]: The ranking of suffixes starting with s[i] is clearly rank[sa[i]]=i sa[rank[i]]=i

Second, seeking sa[] concrete ideas

1. Use multiplication method to construct the first and second keywords, the first keyword small row in front, the first keyword the same second keyword small row in front.

2. Optimization: If the suffix length is enumerated to a large hour, the rank of each suffix is different from each other, then you can exit directly, it is obvious

Third, seeking height[] concrete ideas

First find out Rank[i]

See height[], known as suffix (sa[rank[i-1]) and suffix (i) LCP

As soon as you find the beginning of suffix (sa[rank[i-1]), the brute force enumeration

Here is a small optimization (see http://www.cnblogs.com/LLGemini/p/4771235.html for details)

H[] is height[]

For I>1 and rank[i]>1, there must be h[i]≥h[i-1]-1. (This nature should be well understood!) )

Proof: Set suffix (k) is the suffix of suffix (i-1), and their longest common prefix is h[i-1].

Then suffix (k+1) will be in front of suffix (i) (where h[i-1]>1 is required, if h[i-1]≤1, the primitive is clearly established) and suffix (k+1) and suffix (i) the longest common prefix is h[i-1]-1,

So suffix (i) and the longest common prefix in its previous suffix are at least h[i-1]-1.

calculated in order of h[1],h[2],......, H[n], and using the properties of H array, the time complexity can be reduced to O (n).

 

Iv. Matters of note

Build sa[], pass 4 parameters into the function, set the original string is s, set S length is n,s the size of the largest character m

Build_sa (S[],SA[],N+1,M)

The reason to pass n+1 instead of n is to fill a "0" at the end of S.

Cause: Prevent array from overstepping

     for (int i=1;i<n;i++) x[sa[i]]= y[sa[i]]==y[sa[i-1]]&&y[sa[i]+k]==y[sa[i-1]+k]?p-1:p++;

if &NBSP;X[IDX1]==X[IDX2] (note idx1!=idx2), description to idx1 or   idx2   begins with a length of   len   string certainly does not include the character x [n-1] &NBSP;, so call the variable sa [idx1+len]   and SA [idx2+len]   does not cause the array to go out of bounds, so no special judgment is required.

  

The perfect solution!

Code:

1 intsa[n],rk[n],h[n],c[n],r[n],wa[n],wb[n],sp[n],n,k;2 3 voidGet_sa (int(RNint*sa,intNintm) {4     int*X=WA,*Y=WB;//are all auxiliary variables5      for(intI=0; i<n;i++) c[x[i]=r[i]]++;6      for(intI=1; i<m;i++) c[i]+=c[i-1];7      for(inti=n-1; i>=0; i--) sa[--c[x[i]]]=i;8      for(intk=1; k<=n;k<<=1){9         intp=0;Ten          for(inti=n-k;i<n;i++) y[p++]=i; One          for(intI=0; i<n;i++)if(sa[i]>=k) y[p++]=sa[i]-K; A          -          for(intI=0; i<m;i++) c[i]=0; -          for(intI=0; i<n;i++) c[x[i]]++; the          for(intI=1; i<m;i++) c[i]+=c[i-1]; -          for(inti=n-1; ~i;i--) sa[--c[x[y[i]]]]=Y[i]; -          -Swap (x, y);//x, Y is a pointer, directly interchangeable +p=1; x[sa[0]]=0; -          for(intI=1; i<n;i++) x[sa[i]]= y[sa[i]]==y[sa[i-1]]&&y[sa[i]+k]==y[sa[i-1]+k]?p-1:p + +; +         if(p>=n) Break; AM=p;//optimization: Up to P elements, the next maximum value is P at     } - } -  - voidGet_h () { -     intk=0, mh=-1; -      for(intI=0; i<n;i++) rk[sa[i]]=i; in      for(intI=0; i<n;i++){ -         if(k) k--; to         intj=sa[rk[i]-1]; +          while(R[i+k]==r[j+k]) k++; -h[rk[i]]=K; the     } *}
View Code

  

2017-06-02 20:58:43

A little idea about the suffix array

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.