-manacher algorithm for the longest palindrome substring problem

Source: Internet
Author: User

Manacher algorithm (Http://www.jianshu.com/p/799bc53d4e3d)

For a long string, the time complexity of O (n^2) is unacceptable. Can We do better?
Let's take a look at the flaw in Solution 2 .

1) due to the odd-even nature of palindrome string length, the symmetry axis position of different properties is caused. Solution 2To handle the two cases separately, 2) Many substrings are repeatedly accessed, resulting in poor time efficiency.

Defect 2) can be through this intuitive small?? Reflected:

a b a b a i : 0 1 2 3 4

When I==1, and i==2, the left sub-string ABA was traversed one at a time.
If we can improve the problem of solution 2 , it is very promising to improve the efficiency of the algorithm. Manacher is an improved algorithm for these problems.
(1) solve the problem of symmetry axis position caused by the parity of length.
The Manacher algorithm first makes a preprocessing of the string, inserting the same symbol in all the void positions (including the end-to-end), requiring that the symbol not appear in the original string. This will make all the strings odd-numbered. For example, insert the # number:

aba  --->  #a#b#a#abba --->  #a#b#b#a#

Inserted is the same symbol, and the symbol does not exist in the original string, so the palindrome of the substring is unaffected, the original is a palindrome string, after inserting or palindrome, the original is not a palindrome, is still not a palindrome.
(2) Resolve the issue of duplicate access.
The distance from the leftmost or most right position of a palindrome string to its symmetric axis is called a palindrome radius. Manacher defines a palindrome radius array rl, using rl[i] to represent the palindrome radius of a palindrome string with the first I character as the symmetric axis. We generally deal with strings from left to right, so this defines rl[i] as the first character of a palindrome of the symmetric axis and the distance from the character I. For the two strings after the insert delimiter above, you can get the RL array:

char:    # a # b # a # RL :    1 2 1 4 1 2 1RL-1:    0 1 0 3 0 1 0  i :    0 1 2 3 4 5 6char:    # a # b # b # a # RL :    1 2 1 2 5 2 1 2 1RL-1:    0 1 0 1 4 1 0 1 0  i :    0 1 2 3 4 5 6 7 8

We also asked for a rl[i]-1. It is observed that the value of the rl[i]-1 is exactly the length of the longest palindrome string with the position I as the symmetric axis in the original string without inserting the delimiter. So as long as we find the RL array, we can get the length of the longest palindrome substring.
So the question becomes, how efficiently to find the RL array. The basic idea is to use the symmetry of palindrome string to extend Palindrome string .
We then introduce an auxiliary variable that represents the position of the MaxRight right-most character to be touched by all the palindrome substrings currently accessed. In addition, we must record the MaxRight position of the symmetric axis of the corresponding palindrome string, pos and remember that their position relationship is as follows.

We access the string from left to right to find RL, assuming that the current access to the location of, that is i , the requirement rl[i], in correspondence, i must be on po the right (obviously). But we are more concerned with whether i it is on MaxRight the left or the right. We will discuss it in a divided situation.

1) when iIn MaxRightThe left side;

Case 1) can be used to characterize:

We know that the string between the two red blocks (including the Red Block) is a palindrome, and the i palindrome string for the symmetric axis is overlapping with the palindrome string between the red blocks. We find i the pos symmetrical position, which corresponds to the j one j RL[j] we have already counted. According to the symmetry of the palindrome string, a palindrome string for the i symmetric axis and a j palindrome string for the symmetric axis are part of the same. There are two different kinds of breakdowns here.
A. A j palindrome with a symmetric axis is shorter and shorter than this.

At this time we know rl[i] at least not less than rl[j], and already know part of the i center of the palindrome string, so you can make RL[i]=RL[j] . But the i palindrome string for the symmetric axis may actually be longer, so we try to continue extending to the left and right sides for the axis of i symmetry, until the left and right characters are different, or reach the boundary.
B. A j palindrome with a symmetric axis is long, so long:

At this point, we can only be sure that the part of the two blue lines (that is, not more than the maxright portion) is a palindrome, so from this length, try to center to the left and right sides of the expansion, until the left and right i characters are different, or reach the boundary.
In either case, try to update MaxRight and later pos , since it is possible to get a bigger maxright.
Here's how:

step 1: RL[i] <--- min(RL[2*pos-i], MaxRight-i)step 2: 以i为中心扩展回文串,直到左右两边字符不同,或者到达边界。step 3: 更新MaxRight和pos
2) when iIn MaxRightto the right.

In this case, it is indicated that the i palindrome string for the symmetric axis has not been accessed, so only i the left and right sides of the string are tried to expand, and when the left and right characters are different, or when the boundary of the strings is reached, stop. And then update MaxRight and pos .
(3) Algorithm implementation

DefManacher(s):#预处理 s=' # ' +' # '. Join (s) +' # ' rl=[0]*len (s) maxright=0 pos=0 maxlen=0 for i in Range (len (s)): if i<maxright:rl[i]=min (Rl[2*pos-i], MaxRight-i) else:rl[i]=1  #尝试扩展, pay attention to processing boundaries while i-rl[i]>=0 and I+RL[i] <len (s) and S[i-rl[i]]==s[i+rl[i]]: Rl[i]+=1  #更新MaxRight, pos if rl[i]+i-1>maxright: Maxright=rl[i]+i-1 pos=i  #更新最长回文串的长度 Maxlen=max (MaxLen, RL[i) ) return maxlen-1       

(4) Analysis of complexity
Spatial complexity: The insertion delimiter forms a new string, takes up a linear space, and the RL array occupies a linear space, so the spatial complexity is linear.
Time complexity: Although there are two layers of loops in the code, we can conclude by amortized analysis that the time complexity of manacher is linear. Because the inner loop is only for parts that have not yet been matched, it is only done once for each character, so the time complexity is O (n).

Used to play
Links: Http://www.jianshu.com/p/799bc53d4e3d
Source: Pinterest
Copyright belongs to the author. Commercial reprint please contact the author for authorization, non-commercial reprint please specify the source.

-manacher algorithm for the longest palindrome substring problem

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.