String Matching: From suffix automatic machine to KMP

Source: Internet
Author: User
String Matching on suffix automatic machine (SAM)
====
We construct a relatively short pattern string into Sam.
The suffix of P = "abcabcacab", t [1. I] makes it the longest prefix length of SAM:
T: B a B c a c a B c
1 1 2 3 1 1 2 3 4 5 6 7 1 2 4 5 6 7 6 7 8 9 10 4
If the longest prefix length is | p |, it indicates that the suffix of T [1. I] matches with P.


Memory usage
Multiple trans pointers may be the same node, so double-free may occur as if the tree was deleted:
For this reason, we use the Memory Pool temporarily.
If it is extended to include numbers and spaces, 37 transfer pointers are required.


KMP Algorithm
====
Given mode string P, text string t,
Assume that the s position matches Q characters, that is, P [1 ,.., q] = T [S + 1 ,.., S + q], but does not match in P [q + 1.
Strstr () points the pointer to S + 2 and starts matching again from P [1.
At that time, knuth, Morris and Pratt wanted to move the Pointer a little farther.
Assume that P [1 ,.., k] = T [S + q + 1-k ,.., S + q]. Now we can compare it from P [k + 1]. Obviously, we want the larger K, the better. The smaller the corresponding increment of pointer Movement = Q-K, therefore, we should not miss some exact matching locations.
Merge the preceding two equations to obtain the suffix P [1,..., K] is P [1,..., q. The problem becomes:
For each Q, calculate the longest real prefix of P [1,..., q] (the length is counted as K), and it is also the suffix of P [1,..., q.
We define the prefix function Pi (q): = K.


How to calculate pi (q )?
====
Let's assume that we have calculated pI (q) = K using the recursive idea.
If P [k + 1] = P [q + 1], it is clear that PI (q + 1) = k + 1;
Otherwise, we can see that PI (q) is equivalent to P [1 ,.., q] The longest length of the matched prefix K at the end. We will use this prefix to match and expect P [k + 1] to be the same as P [q + 1, otherwise, K = Pi (k) loops down.
Initial Condition: PI (1) = 0, because the longest real prefix is an empty string.


The current P [1,..., K] matches T [q-k + 1, q], but does not match in T [q + 1,
The application prefix function should be defined from position S + 1-k + q-k =
The process in which a string P [1,..., J] matches P [q + 1-j,... q + 1.
K = Pi (k) until P [k] = P [q + 1].


How do I perform linear string matching?
====
With reference to the suffix automatic machine practice, we make pi and P into an automatic machine. t is on this automatic machine.
Go again.


Prefix function exercise questions:
1. How many times does P appear in T? Tip: Check Pi (PT)
2. (AB) ^ 3 = ababab, how to find the maximum repeated factor R = 3?

3. How to determine whether it is a cyclic shift in a linear time, such as arc and car. (I don't know how to do this yet)



KMP saves memory compared with Sam:





Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.