Into the pit string suffix array

Source: Internet
Author: User
Tags arrays sort
Learning Background

It is said that the suffix array is "a powerful tool for working with strings".
Before learning the string when practicing KMP and AC automata and so on the suffix array pits ...
However, many schools will certainly have a string problem, so now look at this slightly mysterious thing. algorithm data "suffix array-powerful tool for handling strings"
By Ro 2009 National Training Team Paper "Algorithmic Competition introduction Classic-Training Guide" study notes

First of all, say ... Learn this or you have to calm down to see the paper ... This essay is very conscience Orz god Ben Ro Orz
However, the code on the paper does not conform to my code habits ... It is advisable to refer to the code on the Training Guide as appropriate, since there is a good understanding of the comments. Construction

Start by sorting out some important concepts:

The suffix array sa[i] sa[i] represents the beginning subscript of the suffix of the first and the largest of the string, sorted by the dictionary order from small to large.
name Group Rk[i] rk[i] (Warm tip: Rank rank is the System key word) indicates the suffix I I in the order of the suffix of the number of the largest row.
The height array height[i] height[i] represents suffix (sa[i−1]) \text{suffix} (Sa[i-1]) and suffix (sa[i]) \text{suffix} (Sa[i]) of LCP \ TEXT{LCP} (the longest public prefix).

In general, we are going to use these three arrays.
It is important to note that the subscript starts from 0 0 or 1 1, but of course both are possible, but because of the code habit and the specific implementation, I usually start with the subscript of the string starting from 0 0, starting with the ranking of the order from 1 1.

Constructs a suffix array, generally with O (NLGN) O (n\lg n) multiplication algorithm is enough (lazy to learn DC3).
The specific implementation process and optimization is not detailed, or to say the error-prone details:

First, in order to prevent the array from out-of-bounds we can pre-fill a string with a special character less than all occurrences of a character, so that its dictionary order is the smallest of all suffixes, so the subscript of the construction time rank can start from 0 0.
In the construction process, we generally use a cardinal sort . Note If the beginning of the character range is very large, it will be possible to change it to a quick sort, but it seems a little trouble, so the direct discretization to do is also possible.
After the suffix array is made, O (n) o (n) can be used to find the number of known and height arrays.

Also, strictly speaking, height height array subscript should start from 2 2, but generally will height[1] height[1] as 0 0 There is no problem, most of the template to find out will be the case.

Paste a copy of a long time to get out of the template:

Char Str[n];
    struct suffix_array{int n,sa[n],rk[n],height[n],cnt[n];
        void cons () {int m= ' z ' +1,*a=rk,*b=height;
        str[(N=strlen (str)) ++]= ' # ';
        memset (CNT,0,M<<2);
        Rep (I,0,n) ++cnt[a[i]=str[i]];
        Rep (i,1,m) cnt[i]+=cnt[i-1];
        Per (i,0,n) sa[--cnt[a[i]]]=i;
            for (int j=1,p=0;p<n;j<<=1,m=p) {p=0;
            Rep (i,n-j,n) b[p++]=i;
            Rep (i,0,n) if (sa[i]>=j) b[p++]=sa[i]-j;
            memset (CNT,0,M&LT;&LT;2);
            Rep (I,0,n) ++cnt[a[b[i]];
            Rep (i,1,m) cnt[i]+=cnt[i-1];
            Per (i,0,n) sa[--cnt[a[b[i]]]]=b[i];
            Swap (A, b);
            P=1;
            a[sa[0]]=0;
        Rep (I,1,n) a[sa[i]]=b[sa[i-1]]==b[sa[i]]&&b[sa[i-1]+j]==b[sa[i]+j]?p-1:p++;
        }--n;
        if (A!=RK) Rep (i,1,n+1) rk[sa[i]]=i;
            for (int i=0,h=0;i<n;height[rk[i++]]=h) {if (h)--h;
        for (int j=sa[rk[i]-1];str[i+h]==str[j+h];++h); }}}sa;
 

This is the bare title stamp. Application

The first class may not be quite clear about what it would take to sort the suffixes, but after reading the paper, I found ... The suffix array is really powerful!!

All applications of the suffix array are based on the seemingly simple but very essence of this sentence:

any substring of a string can be seen as a prefix to its suffix.

Based on this idea, we can skillfully use the height height array to find a variety of the oldest string problems.

The examples given in this paper are quite comprehensive, and also introduce many classical techniques and routines for solving problems with suffix arrays: RMQ

The two-suffix LCP \TEXT{LCP} of the string is a RMQ problem on the height height array.
Suffix I

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.