"Suffix array" "It's hard to understand."

Last Update:2015-08-28 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Basically a suffix array the template on the web is the "suffix array-a powerful tool for handling strings," the note, O (NLOGN) complexity is really very powerful, But for the first contact (such as the nest), it is difficult to understand (for example, the nest has been good for two days. ), see so many materials feel "challenge program design" suffix array interpretation will be relatively easy to understand, but its complexity is O (NLOG2N), the main difference is the number of sub-string ordering--o (n), while the latter used a fast row--o (NLOGN), This leads to the eventual complexity of the latter more than the former One O (logn).

First of all, the "challenge" in the process, and later the O (NLOGN) sorted down;

First with a refreshing version of SA (Suffix_array) template, the main idea of course is still multiplication method;

1 /*0 (Nlog (n) ^2)*/2#include <iostream>3#include <cstring>4#include <cstddef>5#include <cstdio>6#include <string>7#include <algorithm>8 using namespacestd;9 Const intMAXN =10001;Ten intn,k; One intrank[maxn+1],tmp[maxn+1]; A  - BOOLComp_sa (intIintj) - { the     if(Rank[i]! =Rank[j]) -         returnRank[i] <Rank[j]; -     intRI = I+k <= n? RANK[I+K]:-1; -     intRJ = J+k <= n? RANK[J+K]:-1; +     returnRI <RJ; - } +  A voidCalc_sa (string&s,int*SA)//computes the suffix array of the string s at { -n =s.size (); -     //Initial length is 1 -      for(inti =0; I <= N; i++) -     { -Sa[i] =i; inRank[i] = i < n? S[i]:-1; -     } to  +      for(k =1; K <= N; K *=2) -     { theSort (sa,sa+n+1, COMP_SA);//Two-keyword quick-line *  $         //temporarily store the new calculated rank in TMP and return to rankPanax Notoginsengtmp[sa[0]] =0; -          for(inti =1; I <= N; i++) the         { +Tmp[sa[i]] = tmp[sa[i-1]] + (COMP_SA (sa[i-1],sa[i])?1:0); A         } the          for(inti =0; I <= N; i++) +         { -Rank[i] =Tmp[i]; $         } $     } - } -  the intMain () - {Wuyi     stringS ="Abracadabra"; the     int*sa =New int[S.size () +1]; - Suffixarraymatch (s,sa,t); Wu      Delete[] sa; -SA =NULL; About}

View Code

Of course the first contact people see this code must not know this is what ghost, suggest can find a few blog code comments (there are many great God's comments very meticulous very good), coupled with manual simulation understanding, see a few times will certainly be enlightened ~;

Definition (Understanding!) ）：

Suffix array suffix_array: sa[]

An array of all suffixes of a string sorted by dictionary order;

Sa[i] = k means that the sub-string of small I after ordering is s[k ... n] (take up the string subscript 1~n for good understanding);

And finally we are going to reach the SA array state as shown in the following example (excerpt from Juno PPT):

Rank array: rank[]

Save suffix s[i ... n] rank in the sort;

Rank[i] = k means substring s[i ... n] in the dictionary ordering of all suffixes for the first k small;

Figure 1 shows that sa[1] = 4, rank[4] = 1; That is, the SA array and the rank array are reciprocal relations ;

Calculation of suffix arrays

The basic idea of algorithm- - multiplication

What does multiplication mean?

The order of the substrings with a length of 1 at the beginning of each position is computed, and the order of the substrings of length 2 is computed, and the order of the substrings of length 4 is calculated by using the sequential results of the substring of length 2. Constant multiplication, knowing that the length is greater than or equal to the original string length, the suffix array is obtained;

To calculate the order of substrings of length 2, simply sort the two-character pairs .

　　For example, the original string Aaba (the subscript of the following letters represents the position of the letter in the original string)

A1=A2=a4 < B3, then a1a2 must be less than a2b3

The order of substrings of length 2k is required, as long as you know the order of the substrings of length K.

such as the original string Aabac

A1A2 < a2b3 < a4c5< b3a4, then a1a2 b3a4 must be less than a2b3 a4c5

Note RANKK (i) for S[i, K] (a substring of length starting from I) is the lesser of the substrings of all sequenced lengths of K;

To calculate the order of substrings of length 2k, simply sort the pairs of two rank. Comparison of pairs of rankk (i) with rankk (i+k) and Rankk (j) vs. RANKK (j+k) (two-element comparison) instead of a direct comparison of S[i, 2k] and s[j, 2k]. The comparison between Rankk (i) and Rankk (j) is comparable to S[i, K] and S[j, K], compared to RANKK (i+k) and RANKK (j+k) equivalent to S[i+k, K] and S[j+k , K].

As an example: Abracadabra

Initialize: sa[i] = i;

Rank[i] = s[i]; The initial time is to sort a single character in the string, so you can directly start rank as the ASCII code of the character, note that at this point rank is not the actual sense of the order, only the relative sort, that is s[i]>s[j] rank[i] > rank[j] just;

It is also important to note that there is a small trick to working with strings, which is to define Sa[n] as 1, so that the rank value can be ranked starting from 1.

k = 0, initialized, get the sort of s[i, 1];

Sa[0]: 0 Rank[0]: 97
SA[1]: 1 Rank[1]: 98
SA[2]: 2 rank[2]: 114
SA[3]: 3 Rank[3]: 97
SA[4]: 4 Rank[4]: 99
SA[5]: 5 Rank[5]: 97
SA[6]: 6 rank[6]: 100
SA[7]: 7 Rank[7]: 97
SA[8]: 8 rank[8]: 98
SA[9]: 9 rank[9]: 114
SA[10]: rank[10]: 97
SA[11]: rank[11]:-1

k = 1; Get the sort of s[i, 2];

SA[0]: rank[11]: 0
SA[1]: Ten a rank[10]: 1
SA[2]: 0 AB rank[0]: 2
SA[3]: 7 AB rank[7]: 2
SA[4]: 3 AC rank[3]: 3
SA[5]: 5 ad rank[5]: 4
SA[6]: 1 BR rank[1]: 5
SA[7]: 8 BR rank[8]: 5
SA[8]: 4 CA rank[4]: 6
SA[9]: 6 da rank[6]: 7
SA[10]: 2 ra rank[2]: 8
SA[11]: 9 RA rank[9]: 8

K = 2; Get the sort of s[i, 4];

SA[0]: rank[11]: 0
SA[1]: Ten a rank[10]: 1
SA[2]: 0 Abra rank[0]: 2
SA[3]: 7 Abra rank[7]: 2
SA[4]: 3 Acad Rank[3]: 3
SA[5]: 5 Adab rank[5]: 4
SA[6]: 8 bra rank[8]: 5
Sa[7]: 1 BRAC rank[1]: 6
SA[8]: 4 cada rank[4]: 7
SA[9]: 6 DABR rank[6]: 8
SA[10]: 9 RA rank[9]: 9
SA[11]: 2 Raca rank[2]: 10

K = 4; Get the sort of s[i, 8];

SA[0]: rank[11]: 0
SA[1]: Ten a rank[10]: 1
SA[2]: 7 Abra rank[7]: 2
SA[3]: 0 abracada rank[0]: 3
SA[4]: 3 Acadabra rank[3]: 4
SA[5]: 5 Adabra rank[5]: 5
SA[6]: 8 bra rank[8]: 6
SA[7]: 1 bracadab rank[1]: 7
SA[8]: 4 Cadabra rank[4]: 8
SA[9]: 6 Dabra rank[6]: 9
SA[10]: 9 ra rank[9]: 10
SA[11]: 2 racadabr rank[2]: 11

K = 8; Get the sort of s[i, N];

SA[0]: rank[11]: 0
SA[1]: Ten a rank[10]: 1
SA[2]: 7 Abra rank[7]: 2
SA[3]: 0 abracadabra rank[0]: 3
SA[4]: 3 Acadabra rank[3]: 4
SA[5]: 5 Adabra rank[5]: 5
SA[6]: 8 bra rank[8]: 6
SA[7]: 1 Bracadabra rank[1]: 7
SA[8]: 4 Cadabra rank[4]: 8
SA[9]: 6 Dabra rank[6]: 9
SA[10]: 9 ra rank[9]: 10
SA[11]: 2 Racadabra rank[2]: 11

This gets the SA array;

For the ordering of strings using the Double keyword fast, the complexity is O (NLOGN), each calculation suffix s[i ... n] rank value, the suffix is compared with the previous suffix sorted, if equal rank value is the same, or rank value plus 1;

1 //temporarily store the new calculated rank in TMP and return to rank2tmp[sa[0]] =0;3  for(inti =1; I <= N; i++)4Tmp[sa[i]] = tmp[sa[i-1]] + (COMP_SA (sa[i-1],sa[i])?1:0);5 6  for(inti =0; I <= N; i++)7Rank[i] = Tmp[i];

Finishing O (NLOGN) algorithm later

"Suffix array" "It's hard to understand."

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

"Suffix array" "It's hard to understand."

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

"Suffix array" "It's hard to understand."

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support