"goto" suffix array template

Source: Internet
Author: User
Tags first string goto

IOI2009 paper

Reference:http://www.cnblogs.com/staginner/archive/2012/02/02/2335600.html

1 intWA[MAXN],WB[MAXN],WV[MAXN],WS[MAXN];2 intcmpint(RNintAintBintl)3{returnR[A]==R[B]&AMP;&AMP;R[A+L]==R[B+L];}//As the paper says, because the end fills 0, so if r[a]==r[b] (actually y[a]==y[b]), that is to be merged with the two long J string, the front that must not contain the end 0, so the starting position of the back of at most 0 position, will not be back, Therefore, the array is not out of bounds. 4 //The parameter n of the DA function represents the number of characters in the string, where n contains the 0 that was added to the end of the string, but the illustration does not draw 0 at the end of the string. 5 //the parameter m of the DA function represents the range of characters in the string, which is a parameter of the radix sort, if the original sequence is a letter can be directly taken 128, if the original sequence itself is an integer, then M can take a value greater than the largest integer 1. 6 voidDaint*r,int*sa,intNintm)7 {8     inti,j,p,*x=wa,*y=wb,*T;9     //The following four lines of code is the individual characters (that is, the length of the string 1) for the cardinality of the order, if you do not understand why this can achieve the effect of the radix sort, I would like to actually use a paper and pen simulation, I was the first to understand. Ten      for(i=0; i<m;i++) ws[i]=0; One      for(i=0; i<n;i++) ws[x[i]=r[i]]++;//x[] Inside the intention is to save each suffix of rank value, but here does not go to store rank value, because the subsequent only involves x[] comparison work, so this step can not store the real rank value, can reflect the relative size can.  A      for(i=1; i<m;i++) ws[i]+=ws[i-1]; -      for(i=n-1; i>=0; i--) sa[--ws[x[i]]]=i;//The reason I started looping from n-1 is to ensure that the default string is smaller when there are equal strings in the string.  -     //in the following layer of the loop, p stands for the number of strings not used for rank values, and if p reaches N, then the size of each string is clear.  the     //J represents the length of the current string to be merged, each time a string of two length j is combined into a string of length 2*j, although the number of strings at the end of the string should be different, but the idea is the same.  -     //m also represents the range of values for the elements of the cardinality sort -      for(j=1, p=1;p <n;j*=2, m=p) -     { +         //The following two lines of code implement the ordering of the second keyword -          for(p=0, i=n-j;i<n;i++) y[p++]=i;//in conjunction with the illustrations in the paper, we can see that the second keyword for the element n-j to n is 0, so if you sort by the second keyword, the elements are all in front.  +          for(i=0; i<n;i++)if(sa[i]>=j) Y[p++]=sa[i]-j;//In conjunction with the illustrations of the paper, we can see that the next line of the second keyword is not a 0 part of the result is based on the order of the above row, and only sa[i]>=j of the last line of the Sa[i] string (here and the following refers to the "first string" is not in the dictionary order, The rank of the first character in the string is the second keyword of the sa[i]-j string for the next line, and obviously Rank[sa[i]] in the order of Sa[i]) is incremented, so the ordering of the second keyword for the remaining elements is completed.  A         //after the second keyword cardinality is sorted, y[] contains a string subscript sorted by the second keyword at          for(i=0; i<n;i++) Wv[i]=x[y[i]];//this is equivalent to extracting the first keyword of each string (previously said x[] is to save rank value, that is, the first keyword of the string), put into wv[] inside is convenient for use behind -         //The following four lines of code are the cardinal sort by first keyword -          for(i=0; i<m;i++) ws[i]=0; -          for(i=0; i<n;i++) ws[wv[i]]++; -          for(i=1; i<m;i++) ws[i]+=ws[i-1]; -          for(i=n-1; i>=0; i--) Sa[--ws[wv[i]]]=y[i];//I start the cycle from the n-1, the meaning of the same, but also notice here is y[i], because Y[i] inside the string subscript in         //The following two lines are calculated after the consolidation of the rank value, and the combined rank value should exist x[] inside, but we calculate the time must be used to the rank value of the previous layer, that is, now x[] inside put things, if I have to take from the x[, but also to x[] inside put, how to do? Of course, the first thing to put x[] in another array inside, save chaos. Here is the way to exchange pointers, efficiently implemented x[] "copy" to the y[].  -          for(t=x,x=y,y=t,p=1, x[sa[0]]=0, i=1; i<n;i++) toX[SA[I]]=CMP (y,sa[i-1],sa[i],j)? p1:p + +;//here is the value of the calculated string rank using x[], remember that we said before, when calculating the value of sa[] If the string is the same as the default before the smaller, but here to calculate rank must be the same string as the same rank, otherwise p== No more loops after N.  +     } -     return; the } *  $ //The key to be able to linearly calculate the value of height[] is the nature of h[] (height[rank[])), which is h[i]>=h[i-1]-1, and the following is a detailed analysis of the origin of this inequality. Panax Notoginseng //the proof part of the paper at the beginning to see me foggy, and then drew a bit finally figured out, we first put what to put in this: for the first suffix, set j=sa[rank[i]-1], that is, J is i the last string by rank, The longest public prefix of I and J by definition is height[rank[i]], and we are now wondering what Height[rank[i]] is at least, and what we want to prove is at least height[rank[i-1]]-1.  - //OK, now let's get to the card.  the //first of all, we may as well set the I-1 string (here and the next refers to the "first string" is not in the dictionary order, is according to the position of the first character in the string) by the dictionary order of the preceding string is the K string, note that K is not necessarily i-2, Because the K-string is the i-1 in the dictionary order, it does not refer to the first i-2 string in front of the i-1 in the original string.  + //At this point, according to the definition of height[], the K string and the i-1 string of the public prefix is naturally height[rank[i-1]], and now discuss the relationship between the k+1 string and the first string.  A //In the first case, the K string and the first character of the i-1 string are different, then the rank of the k+1 string may be either in front of I or behind I, but it has no relation, because Height[rank[i-1] is 0, then no matter height[rank[ I]] How many will have height[rank[i]]>=height[rank[i-1]]-1, that is, h[i]>=h[i-1]-1.  the //In the second case, the first character of the K string and the I-1 string is the same, so since the k+1 string is the K string minus the first character, and the first character of the first string is the i-1 string, then it is obvious that the k+1 string is preceded by the string I, Or there is a contradiction. At the same time, the longest common prefix of the K-string and the I-1 string is height[rank[i-1]], then the natural k+1 string and the longest common prefix of the I-string are height[rank[i-1]]-1.  + //so far, the second case has not yet been proved, and we can imagine that for those strings that are higher than the dictionary rank of the I string, who and I have the highest similarity (the similarity here refers to the length of the longest public prefix)? It's obviously the string that's next to the I string, which is sa[rank[i]-1]. That is to say Sa[rank[i]] and sa[rank[i]-1] The longest common prefix is at least height[rank[i-1]]-1, then there is height[rank[i]]>=height[rank[i-1]]-1, that is h[i] >=h[i-1]-1.  - //after proving these, the following code is easier to read.  $ intRANK[MAXN],HEIGHT[MAXN]; $ voidCalheight (int*r,int*sa,intN) - { -     inti,j,k=0; the      for(i=1; i<=n;i++) rank[sa[i]]=i;//Calculate the dictionary rank for each string -      for(i=0; i<n;height[rank[i++]]=k)//The value of the computed Height[rank[i]], which is K, is assigned to Height[rank[i]]. I is a loop from 0 to n-1, but actually height[] calculates the order of Height[rank[0]] [height[rank[n-1]]. Wuyi      for(k?k--:0, j=sa[rank[i]-1];r[i+k]==r[j+k];k++);//the last calculation is K, first of all to determine if K is 0, then K will not move, starting from the beginning of the first character to see the first string and the number of first J string is the same, if K is not 0, as we have previously proved, the longest public prefix length is at least k-1, The k-1 character after the first character starts to check.  the     return; - } Wu  - //Finally, it is about DA and calheight call problem, actually in "Rom" written in the source program is the following call, so we can see clearly da and Calheight int n is not a concept, The valid range for the value of the height array is height[1]~height[n] where height[1]=0, the reason is sa[0] is actually the 0 we fill, so the longest common prefix of sa[1] and sa[0] is naturally 0.  About //128: At the beginning, M takes 128 because there are only ASC codes in the general string, so 128 can cover all the characters, so there is no problem when the cardinality is sorted.  $Da (r,sa,n+1, -); -Calheight (R,sa,n);

"goto" suffix array template

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.