A summary of the "Go" suffix array problem solving

Source: Internet
Author: User
Tags repetition

Previously felt that the suffix automatic opportunity, ignored the suffix array, now found that the suffix array + binary function is very strong, and the suffix automaton does not seem to implement.

Forward, convenient for teammates to look around. These days I also as soon as possible bad to mend.

(Can not find the original blogger website, error)

Suffix array Problem Solving summary:1. The number of distinct substrings of a single substring is obtained. spoj 694,spoj 705.

This problem is a special evaluation problem. Realize the fact that all substrings in a string are necessarily prefixes of its suffix. (This sentence is a little bit around ...) For each sa[i] suffix, its starting position Sa[i], then it can get the suffix length substring (n-sa[i]), and where height[i] is the same as the previous suffix, so it can produce the actual number of suffixes is N-sa[i]-height[i]. Iterate through all the suffixes, adding up the number of suffixes it produces is the answer.

Code and the following:http://hi.baidu.com/fhnstephen/blog/item/68f919f849748668024f56fb.html

2. The longest common prefix of the suffix. (recorded as LCP (x,y))

This is one of the most basic properties of the height array. Refer to Ro's paper for details. The longest public prefixes of the suffix I and suffix J are the minimum values in the height value between their rows in the SA array. This description may be a bit messy, formally speaking, make x=rank[i],y=rank[j],x<y, then LCP (I,J) =min (Height[x+1],height[x+2]...height[y]). LCP (I,i) =n-sa[i]. To solve this problem, use the RMQ St algorithm (I will only do this, or with the recent transformation of the public ancestor that approach).

3. Longest repeating substring (can overlap)

To see, any repeating substring is necessarily the longest public prefix of a two suffix. Because, two suffixes of the public prefix, it appears in the two suffixes, and the starting position is different, so this common prefix must be repeated more than two times (can overlap). The longest common prefix for any two suffixes is the minimum value in a segment of the height value, so the maximum is the maximum value in the height value (that is, one LCP (sa[i],sa[i+1])). So just figure out the height array and then output the maximum value.

A title and code:http://hi.baidu.com/fhnstephen/blog/item/4ed09dffdec0a78eb801a0ba.html

4, the longest repetition does not overlap substring PKU1743

The only difference between this question and 3 is whether it overlaps. Coupled with the inability to overlap this restriction, the direct solution is difficult, so we choose the two-part enumeration answer, the problem is converted to a decision-related problem. Assuming that the length of the enumeration is K, how do you determine if there is a repeating non-overlapping substring of length k?

First, the suffix is divided into groups according to the height array, so that the height value between suffixes is not less than k in each set of suffixes . After this grouping, it is not difficult to see that if a group of suffixes is greater than 1, then there is a common prefix among them, whose length is the minimum value of the height value between them. After we group, the minimum value of the height value between each set of suffixes is greater than or equal to K. Therefore, in a group with suffixes greater than 1, it is possible to have substrings with a length of not less than K that meet the constraints of the topic . The existence of a valid substring with a length of at least K is indicated as long as the criteria for determining the subject limit are established .

For the subject, the restriction is not overlapping, and the method of judging is that the starting position of the suffix with the largest starting position minus the starting position of the suffix with the lowest start location >=k in a set of suffixes . If this condition is satisfied, then the common prefix of the two suffixes will not only occur two times, but also occur two times from the starting position interval greater than or equal to K, so it does not overlap.

A deep understanding of this method of height grouping and the method of judging overlap, plays a pivotal role in the later problems.

Practice and puzzle:http://hi.baidu.com/fhnstephen/blog/item/85a25b208263794293580759.html

5. The longest repetition (overlapping) substring that appears k times. PKU3261

When using the suffix array to solve problems, the "longest " encountered, in addition to special circumstances (such as problem 3), generally need a two-point answer, using height values to group. The limitations of the subject are K-times. Just judge, there is not a group of suffix number is not less than K can be. I believe that with the analysis of the problems ahead of me as a basis, this should not be difficult to understand. Pay attention to understanding " not less than k times" instead of "equal to K times " reasons. If you can't understand it, you can find a concrete example to analyze and analyze.

Title and problem:http://hi.baidu.com/fhnstephen/blog/item/be7d15133ccbe7f0c2ce79bb.html

6, the longest palindrome substring ural1297

There is no straightforward way to solve this problem, but you can take the enumeration method. The specific is the enumeration of the central location of the palindrome string i. Note that the length of the palindrome string is an odd or even two case analysis. What we are going to do, then, is to ask for the maximum length of the palindrome string that is centered on I. By using the suffix array, you can design a method of finding the longest common prefix of the suffix of I and the prefix I forward. I have some problems with the presentation here, but it does not affect understanding.

To quickly find the longest prefix, the original string can be reversed and then followed by the original string. In the topic of using a suffix array, when you concatenate two (N) strings, the middle is separated by characters other than No. 0 that are not likely to appear in the original string . This can be done without affecting the properties of the suffix array. The problem can then be converted to the longest public prefix of two suffixes. Specific details, left to everyone to think about ... (Lazy ...) Forgive me, it's been so many words. One hours, TOT .

Title and problem:http://hi.baidu.com/fhnstephen/blog/item/68342f1d5f9e3cf81ad576ef.html

7, ask a string up by which string copy several times get PKU2406

Please refer to PKU2406 for specific problem description . This problem can be solved with KMP, and is better than the suffix array.

It is also very difficult to solve the problem directly by using the suffix array (mainly, even if the two points answer, it is difficult to solve the transformation of the decision-making problems. ), but can be the same as the length of the enumeration template string K (template string refers to the copied string) to turn the problem into a suffix array can be resolved by the determinant of the problem. First determine whether K can be divisible by n, and then just look at LCP (1,k+1)(the actual writing program in C is LCP (0,k)) is n-k.

Why is that all right? This takes fully into account the nature of the suffix. When LCP (1,k+1) =n-k, the suffix k+1 is a prefix of the suffix 1 (that is, the entire string). (Because the suffix k+1 is the length of n-k) then the suffix 1 the former k characters must and suffix Span lang= "en-US" >k+1 the former k characters correspond to the same. The prefix 1 k+1..2k characters, and equivalent to the suffix k+1 the former k characters, Therefore, the suffix 1 the former k characters corresponding to the same, and the suffix k the k+ 1..2K corresponds to the same. And so on, as long as LCP (1,k+1) =n-k, then s[1..k] can be obtained by self-copying n/k times to get the entire string. Find k The minimum value, you can get n/k the maximum value.

Title and problem:http://hi.baidu.com/fhnstephen/blog/item/5d79f2efe1c3623127979124.html

8. Find the longest common substring of two strings. Pku2774,Ural1517

First distinguish between the "longest common substring " and the "longest common subsequence ". The substring of the former is continuous, and the latter can be discontinuous.

For problems with two strings, they are typically linked together to construct a height array. Then, the longest common substring problem is equivalent to the longest common prefix problem of the suffix. However, not all LCP values can be used as an answer to the question. Only if two suffixes belong to two strings, their LCP values can be used as an answer. As with question 3, the answer to the topic must be a height value, because the LCP value is the minimum value in a segment height value. When the interval length is 1 o'clock, theLCP value is equal to a height value. So, the subject just scan the suffix, find the suffix of the two string is the maximum value of the height value can be. Judging method here does not explain, left to everyone to think ...

Topics and Problems:

Http://hi.baidu.com/fhnstephen/blog/item/8666a400cd949d7b3812bb44.html

Http://hi.baidu.com/fhnstephen/blog/item/b5c7585600cadfc8b645aebe.html

9, repetition of the most repeated substring spoj 687,Pku3693

The difficulty is bigger one question, mainly is Ro's paper the key problem to write somewhat vague. The specific meaning of the topic can be referred to Pku3693.

Another problem that is difficult to solve by two-point enumeration answers (because the number of repetitions is required), so choose the naïve enumeration method. The length k of the repeating substring is enumerated first, and the suffix array is used to find the maximum number of repetitions of a substring of length k. Note that if a string is repeated 2 times (not discussed here, because it is inevitable), then it must contain s[0],s[k],s[2*k] ... Of the two adjacent to each other. So, we can enumerate a number I, and then determine how many times the string of length k from the position of I*k can recur. The method of judging is similar to that in 8,LCP (I*k, (i+1) *k)/k+1. However, just this ignores some special cases where the starting point of a repeating substring is not in the [i*k] position. How should we solve this situation? Look at the following example:

Aabababc

When the k=2,I=1, is enumerated to the position of 2, at this time the repeating substring is BA (note that the first bit is 0), LCP (2,4) = 3, so the BA recurs 2 times. In fact, the string ab with a starting position of 1 appears more than 3 times. We note that in this case,LCP (2,4) =3,3 is not an integer multiple of 2. Indicates that the current repeating substring does not appear more than once in the end, and repeats the part (where there is a multiple repetition of a b). If I say that you do not understand, then more specifically:

Sa[2]=bababc

Sa[4]=babc

Lcp=bab

Now notice that the BA recurstwo times, and a b appears, and A does not appear. Then, it is not difficult to think of, you can move the position of the enumeration forward one, so that the last B will be able to make a duplicate substring with the previous one, and if the former one is exactly A, then the answer can be 1 more. Therefore, we need to find out A=LCP (I*k, (i+1) *k)%n, and then move the k-a bit forward, and then use the same method to find the length of repetition. Here, make b=k-a, only need LCP (b,b+k) >=k to be able. In fact,when LCP (b,b+k) >=k, theLCP (B,B+K) must be greater than or equal to the LCP value previously obtained, while the length of the answer only adds 1. A friend who has no understanding is well understood.

Title and problem:http://hi.baidu.com/fhnstephen/blog/item/870da9ee3651404379f0555f.html

10. Common substring problem with multiple strings PKU3294

First concatenate the strings, then construct the height array, and then what?

Yes, the two-point answer to determine whether the feasibility of the line. The feasible conditions are straightforward: there is a set of suffixes that have more than one suffix in the number of different strings required by the topic. That is, if the topic requires at least a K-string, then there is a set of suffixes, in different strings of the suffix number greater than or equal to K.

Title and problem:http://hi.baidu.com/fhnstephen/blog/item/49c3b7dec79ec5e377c638f1.html

11. The oldest string in all strings appearing or reversed PKU1226

Http://hi.baidu.com/fhnstephen/blog/item/7fead5020a16d2da267fb5c0.html

12. The oldest string that appears at least two times in all strings without overlapping spoj220http://hi.baidu.com/fhnstephen/blog/item/ 1dffe1dda1c98754cdbf1a35.html

The reason to say the two questions together, because they are similar, methods in the previous topics have appeared. For multiple strings, linked up, after inversion, each string is reversed and the original string is connected together, the back-write string and the original string as the same string; the longest, the second answer after the height of the group; appear in all strings (also after the reverse write), the method of judging and 101,k= n; no overlap see question 4, except that there is a test for each string.

the number of repeating substrings of 13, two strings. Pku3415

I personally feel that a problem is quite difficult. See Pku3415 for specific topic descriptions .

We can go here:http://hi.baidu.com/fhnstephen/blog/item/bf06d001de30fc034afb51c1.html

I recommend you according to the tips of my solution, my own thinking to draw the method of the subject. As a last question in this note, I'm not going to say too much. Really can't think out of, directly contact me, my qq:403231899, let me know you are also learning programming, I will add you.

14, the final summary

With the suffix array problem solving has a certain regularity, which is determined by the nature of the suffix, specifically summarized as follows:

1,N string problem (n>1)

Method: Connect them together, and the middle is separated from each other, not the No. 0 characters, which do not appear in the original string .

2. The longest common substring under unrestricted conditions (repeating substrings are the longest common prefixes of suffixes)

Method:The maximum value of height. The unrestricted condition here is that there are no restrictions on substrings. Only the longest common substring of two strings can be the maximum value of height directly.

3. The oldest string under special conditions

Methods: Two-part answer, then according to the height array to group, according to the criteria to complete the decision-making problem. The common substring problem for three or more strings also requires a two-point answer. The string length to be verified at this time is Len, and the special conditions are:

3.1. Appearing in K-strings

Condition: The number of suffixes belonging to different strings is not less than K. (in a set of suffixes, omitted below)

3.2. No overlap

Condition: In the suffix that appears in the same string, the maximum value of the occurrence position minus the minimum value is greater than or equal to len.

3.3, can overlap the occurrence k times

Condition: The number of suffixes that appear in the same string is greater than or equal to K. If each string needs to be satisfied, it needs to be judged by string.

4. Special counting

Methods: According to the nature of the suffix, and the requirements of the topic, through their own thinking, see if the suffix array can be achieved. General and "substring " related topics, with the suffix array should be able to be resolved.

5, repeat the problem

Know one thing:LCP (I,I+K) can be judged, with I as the starting point, the length of a string of K, it is the length of the back of the copy of how much, and then according to specific topics specific analysis, the algorithm can be obtained.

    • Single string problem
      • 1 repeating substrings
        • The longest repeating substring can be crossed
        • Cannot cross the longest repeating substring poj1743
        • 1,3 the longest repeating substring of k-times that can be crossed
      • 2 Sub-string number problem
        • 2,1 number of different substrings spoj694spoj705
      • 3 Cyclic substring problems
        • 3,1 to find the minimum cycle section
        • 3,2 consecutive repeating substrings with the most repetitions spoj687poj3693
    • Two character string problems
      • 1 Common substring problems
        • The longest common substring in poj2774ural1517
      • 2 Sub-string number problem
        • 2,1 Common substrings of a specific length
    • Multiple string issues
      • 1 Common substring problems
        • The oldest string that appears in the K-strings poj3294
        • The eldest string of K-times in each string spoj220
        • 1,3 the eldest string that appears in each string or after a reversal poj1226

A summary of the "Go" suffix array problem solving

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.