0, first say humorous outside the bar
In this chapter, the most classic is the KMP algorithm of pattern matching. In fact, is to consolidate their knowledge, I put this chapter of knowledge and Zy Shun, the main talk about the KMP algorithm. About one hours, after speaking, Zy very excited to say: "It feels so magical." Very touched. Feeling finally let a person who has not experienced the charm of the algorithm to feel the charm of the algorithm, I feel that she can from a few lines of code "found that human intelligence incredibly bright."
I'm sorry.
I'm waiting for you to return to the king someday.
First, KMP ideas
1. To understand KMP, first understand the most naïve brute force algorithm
N words are omitted here ... I don't know how to read.
2.KMP is equivalent to optimizing it
The complexity of the brute force algorithm is O (nm), that is, the worst case scenario for each location of the main string data traversal pattern string. The disadvantage is that the main string of data repeatedly traversed many times, the pattern string is traversed countless times, although the data is already known.
The optimization of KMP is to perform a traversal of its elements and record the required data to avoid repeated traversal in the future.
The features of the KMP algorithm are: Fast back SHIFT + Never back (main string)!
Second, the main code implementation
Slightly......
Note that in the case of this main code, the next table does not need to be understood first, as long as it is considered known, and each of its positions is understood as the position to which to go back.
Third, next[] table construction (key)
1.NEXT[J] is the position to be back to, that is: in the pattern string [0..j-1] (or [0,j)), the matching pattern string prefix and the maximum length of the main string suffix. That is, the pattern string [0..j-1] can match the pattern string [i-j,i-1].
Note is the pattern string 0: J-1 instead of 0. J!!!!!!!!!!!!!!!!!!!!!
2. When the pattern string prefix and suffix do not match, then next[j]=0, that is, length 0, restarts the match.
3.next[0]=-1;
J points to the main string [i-j,i] matches the pattern string [beginning, J), so next[0]=-1 is equivalent to introducing a non-existent sentinel at-1, simplifying the unified understanding of the code. The special case is converted into a normal condition for processing. It can also be thought of as a token, that is, when the first element does not match, the tag j<0, and then it is re-processed.
4.next[] Table construction. That is, the substring before each position of the pattern string is self-matching.
Analysis:
See the online useful mathematical inductive method to explain the solution steps. (The mathematical induction method has three steps, ① the initial condition establishment ② assumes the n=k situation establishes ③ from n=k situation derivation n=k+1 situation, if the n=k+1 situation establishes, then the formula sets up the natural number set. In fact, the feeling is not entirely accurate. This is more accurate with recursion. In the case of a known initial condition, the case of any subsequent n+1 can be introduced by N.
Just apply to this problem the solution steps are slightly different, the recursive initial conditions are actually difficult to define, we might as well consider later, we can first assume that f (n) is known, and then consider how to derive the case of n+1 from N.
Steps:
(The pattern string is now considered the main string p, the pointer is J; The pattern string is called the secondary pattern string C, and the pointer is T).
Assuming Next[j] is known, note is 0. The longest prefix of the j-1 is known, not 0. J
1) If P[J]==C[T]/* C[NEXT[J]] * * and update the next table t++;j++;next[j]=t;//differences, update the table
2) If P[J]!=C[T] The secondary mode string pointer back and do not update the next table (until equal or to the end of the update)//non-differential, back not update
3) Repeat steps 1) 2) to the next table to update to the last data.
Problem:
The above steps leave a problem, that is the initial condition. Step 2) There is also a simple question, which is exactly when to retire? In fact, this is a problem, that is, how to set initial conditions.
First, it is clear that when the current sequence has no matching prefix, the next table should be set to zero at that position. So how do you determine that there is no matching suffix? In fact, this is still how to set the initial conditions of the problem. To be clear, that's the question of how to set the Next[0. So how do you set it up? In fact, this problem is explained in the KMP main program once again.
Because Next[j] is stored in the pattern string [0..j-1] (or [0,j)), the matching pattern string prefix and the maximum length of the main string suffix. [0,-1] Obviously not discussed, of course, in fact, next[0] is not actually "stored in the pattern string [0..j-1] (or [0,j)], the matching pattern string prefix and the maximum length of the main string suffix" next[0 "is the position when the 0 is returned to the location of how to handle, may wish to say first next[1], according to the above, next[1] should indicate that the pattern string [0..0] (that is, the pattern string first element) as the main string, at this time the maximum prefix, some people say it is said, it must be 1 ah, if it is so set, then we assume that when the P character and C[1] When unequal, the pointer T will return NEXT[T] that is next[1], so that it is equivalent to no change, then how to set next[1], think about how to deal with this at this time? In fact, it should be J + +, and then re-J and C[0], in order to make this special case and general application Step 1) Unified processing We just need to put the pointer T (at this point T Point 0) step back, and then in J, T go up in comparison, this time t step back, that is, suppose there is a t-1 Sentinel The t is returned to 1, which is represented by the code as NEXT[0]=-1. In this sense, next[0] does not indicate the longest prefix length, but it still means "back position" from this semantically, in fact next[0] and the other values of the next table are also consistent. As for next[2], next[2] can be launched by next[1] with the above steps ... "Laugh and cry."
OK, so the problem is basically solved. So it seems that the most important thing about the KMP algorithm is the implementation of the next table.
KMP algorithm (mainly explaining the construction of next table)