Suffix array detailed + template

Source: Internet
Author: User

Suffix array Note

sa[] who is the first?

Suffix array: The suffix array SA is a one-dimensional array that holds 1..N of an arrangement sa[1], sa[2],......,sa[n], and guarantees Suffix (Sa[i]) < Suffix (sa[i+1]), 1≤i<n. That is, the n suffixes of S are sorted from small to large, and the beginning position of the ordered suffix is placed in the SA sequentially.

rank[] who is the first rank array: Rank array rank[i] saved is Suffix (i) in all suffixes from small to large ranked "rank".

r[]: Raw Data J The length of the current string, each cycle based on the 2 J-length string ranking to obtain a 2j length string ranking.

y[]: Indicates the sort result of the second keyword of a string of length 2j, indicated by the subscript of the first keyword that stores a 2j long string.

wv[]: 2j long string The first keyword of the rank ordinal.

ws[]: Count array, count sort used to.

x[]: Initially a copy of the original data R (which actually also represents the rank of a string of length 1), followed by a 2j length string.

P: The number of different rankings.

Fragment 1. Sort the string with length 1 (the first step of the function)
 for (i=0; i<m;i++) ws[i]=0;  for (i=0; i<n;i++) ws[x[i]=r[i]]++;  for (i=1; i<m;i++) ws[i]+=ws[i-1];  for (i=n-1; i>=0; i--) sa[--ws[x[i]]]=i;

① is used for the cardinality sort, or other sort can be used

②r[] to store the original input string, x[] is the ASCII rendering of r[] (easy to sort)

③m is an estimated number that represents the maximum ASCII value, making the boundary in the loop

④n here is the length of the string +1, followed by the Gaga minus minus the expression (seemingly don't mind directly with the length of the string)

⑤ last line is difficult to understand, but the practice proves that it is true, sa[i]=j that the first name is J.

Ws[i] is the sum of the number of occurrences of the first and previous characters, the greater the ws[i], and the larger the corresponding character value, for example, if a string is Aaabaa, then a appears in the number of times 5,b occurs 1, according to the above principle, can be regarded as ws[a]=5,ws[b]= 6, although a is in the top 5, b in sixth place.

After the output of the Aabaaaab is 801345627, it corresponds with the SA definition.

Aabaaaab ~

23845679 1 Very correct

Understand this, and the last line will be clear.

2. Make several cardinal orders

Because the rank in front of the order may be duplicated, you have to do it a few more times until all the positions are no longer the same

 for(j=1, p=1; p<n; j*=2, m=p) {       for(p=0, I=n-j; i<n; i++) y[p++]=i;  for(i=0; i<n; i++)if(SA[I]&GT;=J) y[p++]=sa[i]-J;  for(i=0; i<n; i++) wv[i]=X[y[i]];  for(i=0; i<m; i++) ws[i]=0;  for(i=0; i<n; i++) ws[wv[i]]++;  for(i=1; i<m; i++) ws[i]+=ws[i-1];  for(i=n-1; i>=0; i--) sa[--ws[wv[i]]]=Y[i];  for(t=x,x=y,y=t,p=1, x[sa[0]]=0, i=1; i<n; i++) X[sa[i]]=CMP (y,sa[i-1],sa[i],j)? p1:p + +;}

This piece of code is more complicated than the first step of the above function.

① from the outermost loop, j is in a multiplying state, representing the length of each small segment of the string being compared.

The first line in the ② loop, looping the j-1 times, is the advance processing of the next few numbers (its second keyword is 0)

That is, all plus 0 of the number

③ second line, and then turn up to see the role of SA. The first thing to understand is that this line has abandoned something,

Since the second keyword is sorted, the first keyword does not look first, so there is a condition if (sa[i]>=j)

This statement is followed by a y[p++]=sa[i]-j, minus J is also because of this

Here, the second keyword sort is done.

④ start the first keyword sort

Assume that you need to sort the number 92 71 10 80 63 90

So y[]=3 4 6 2 1 5 is the ordinal of the rank increment after the second keyword is sorted

X[]=10 80 90 71 92 63 The result of sorting the second keyword

for (i=0; i<n; i++) wv[i]=x[y[i]; copy x[] array to wv[]

⑤ the remainder of the cardinality sort is the same as a string with a length of 1.

Complete code (refer to understanding)
#include <cstdio>#include<iostream>#include<cstring>#defineLL Long Long#defineULL unsigned long Longusing namespacestd;Const intmaxn=100010;//the following is a multiplication algorithm for the suffix arrayintWA[MAXN],WB[MAXN],WV[MAXN],WS[MAXN];intcmpint(RNintAintBintl) {returnr[a]==r[b]&&r[a+l]==r[b+l];}/**< Incoming parameters: str,sa,len+1,ascii_max+1*/ voidDaConst CharR[],intSa[],intNintm) {      inti,j,p,*x=wa,*y=wb,*T;  for(i=0; i<m; i++) ws[i]=0;  for(i=0; i<n; i++) ws[x[i]=r[i]]++;//Subscript the ASCII code of the character       for(i=1; i<m; i++) ws[i]+=ws[i-1];  for(i=n-1; i>=0; i--) sa[--ws[x[i]]]=i; /*cout<< "SA" <<endl;; for (int i=0;i<n+1;i++) cout<<sa[i]<< ";*/       for(j=1, p=1; p<n; j*=2, m=p) { for(p=0, I=n-j; i<n; i++) y[p++]=i;  for(i=0; i<n; i++)if(SA[I]&GT;=J) y[p++]=sa[i]-J;  for(i=0; i<n; i++) wv[i]=X[y[i]];  for(i=0; i<m; i++) ws[i]=0;  for(i=0; i<n; i++) ws[wv[i]]++;  for(i=1; i<m; i++) ws[i]+=ws[i-1];  for(i=n-1; i>=0; i--) sa[--ws[wv[i]]]=Y[i];  for(t=x,x=y,y=t,p=1, x[sa[0]]=0, i=1; i<n; i++) X[sa[i]]=CMP (y,sa[i-1],sa[i],j)? p1:p + +; }      return;}intSA[MAXN],RANK[MAXN],HEIGHT[MAXN];//finding the height array/**< Str,sa,len*/voidCalheight (Const Char(RNint*sa,intN) {      inti,j,k=0;  for(i=1; i<=n; i++) rank[sa[i]]=i;  for(i=0; i<n; height[rank[i++]]=k) for(k?k--:0, j=sa[rank[i]-1]; R[I+K]==R[J+K]; k++); //Unified       for(inti=n;i>=1;-I.) ++sa[i],rank[i]=rank[i-1];}CharSTR[MAXN];intMain () { while(SCANF ("%s", str)! =EOF) {            intlen=strlen (str); Da (Str,sa,len+1, the);            Calheight (Str,sa,len); Puts ("--------------all Suffix--------------");  for(intI=1; i<=len; ++i) {printf ("%d:\t", i);  for(intj=i-1; j<len; ++j) printf ("%c", Str[j]); Puts (""); } puts (""); Puts ("-------------after sort---------------");  for(intI=1; i<=len; ++i) {printf ("sa[%2d] =%2d\t", I,sa[i]);  for(intj=sa[i]-1; j<len; ++j) printf ("%c", Str[j]); Puts (""); } puts (""); Puts ("---------------Height-----------------");  for(intI=1; i<=len; ++i) printf ("height[%2d]=%2d \ n", I,height[i]); Puts (""); Puts ("----------------Rank------------------");  for(intI=1; i<=len; ++i) printf ("rank[%2d] =%2d\n", I,rank[i]); Puts ("------------------END-----------------"); }      return 0;}

Suffix array detailed + template

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.