String Matching on suffix automatic machine (SAM)
====
We construct a relatively short pattern string into Sam.
The suffix of P = "abcabcacab", t [1. I] makes it the longest prefix length of SAM:
T: B a B c a c a B c
1 1 2 3 1 1 2 3 4 5 6 7 1 2 4 5 6 7 6 7 8 9 10 4
If the longest prefix length is | p |, it indicates that the suffix of T [1. I] matches with P.
Memory usage
Multiple trans pointers may be the same node, so double-free may occur as if the tree was deleted:
For this reason, we use the Memory Pool temporarily.
If it is extended to include numbers and spaces, 37 transfer pointers are required.
KMP Algorithm
====
Given mode string P, text string t,
Assume that the s position matches Q characters, that is, P [1 ,.., q] = T [S + 1 ,.., S + q], but does not match in P [q + 1.
Strstr () points the pointer to S + 2 and starts matching again from P [1.
At that time, knuth, Morris and Pratt wanted to move the Pointer a little farther.
Assume that P [1 ,.., k] = T [S + q + 1-k ,.., S + q]. Now we can compare it from P [k + 1]. Obviously, we want the larger K, the better. The smaller the corresponding increment of pointer Movement = Q-K, therefore, we should not miss some exact matching locations.
Merge the preceding two equations to obtain the suffix P [1,..., K] is P [1,..., q. The problem becomes:
For each Q, calculate the longest real prefix of P [1,..., q] (the length is counted as K), and it is also the suffix of P [1,..., q.
We define the prefix function Pi (q): = K.
How to calculate pi (q )?
====
Let's assume that we have calculated pI (q) = K using the recursive idea.
If P [k + 1] = P [q + 1], it is clear that PI (q + 1) = k + 1;
Otherwise, we can see that PI (q) is equivalent to P [1 ,.., q] The longest length of the matched prefix K at the end. We will use this prefix to match and expect P [k + 1] to be the same as P [q + 1, otherwise, K = Pi (k) loops down.
Initial Condition: PI (1) = 0, because the longest real prefix is an empty string.
The current P [1,..., K] matches T [q-k + 1, q], but does not match in T [q + 1,
The application prefix function should be defined from position S + 1-k + q-k =
The process in which a string P [1,..., J] matches P [q + 1-j,... q + 1.
K = Pi (k) until P [k] = P [q + 1].
How do I perform linear string matching?
====
With reference to the suffix automatic machine practice, we make pi and P into an automatic machine. t is on this automatic machine.
Go again.
Prefix function exercise questions:
1. How many times does P appear in T? Tip: Check Pi (PT)
2. (AB) ^ 3 = ababab, how to find the maximum repeated factor R = 3?
3. How to determine whether it is a cyclic shift in a linear time, such as arc and car. (I don't know how to do this yet)
KMP saves memory compared with Sam: