The correctness proof of KMP algorithm and a small optimization

Source: Internet
Author: User

Is it a little unfair to put the work directly?

It doesn't matter anyway, you guys just have to look happy.

KMP algorithm

For the pattern string $p$, establish its prefix function $ n$, where $n [Q] $ represents the length of the longest suffix that can be matched to the prefix (also understood as the end position of the prefix) in $p$, at the end of the $q$ position, and in the match, if $p[i]$ and $s[j]$ mismatch, then $i=n [ I-1] +1$, otherwise $i=i+1,j=j+1$

Now consider how to construct a $n$, set up the current to calculate the $n[1..i-1]$, so that $k=n[i-1]$, if the $P [k+1]=p[i]$, then $n[i]=k+1$, otherwise $k=n[k]$. Repeat this process until you find the $n[i]$

This algorithm can be used in $\theta (| p|) $ in time constructs a prefix function $n$, in $\theta (| s|) $ time to complete the match, with a total time complexity of $\theta (| s|+| p|) $

The correctness proof of KMP algorithm

Prove the correctness of the matching process first:

In the process, if the $p[1..q]$ and the $s[s+1...s+q]$ match, and $p[q+1]$ and $s[s+q+1]$ mismatch, then the definition of $n$ can immediately derive $p[1..n[q]]$ and $S [s+q-n[q]+1...s+q]$ match, and $s [1...t]$ and $s[s+q-t+1...s+q]$ mismatch $ (n[q]<t<q) $, that is, you only need to verify the match between $p[n[q]+1]$ and $s[s+q+1]$, the correctness of the matching process can be verified.

Next, we prove the correctness of the prefix function $n$ calculation:

Make $n^*[q]= \{n[q],n^{(2)}[q],..., n^{(t)}[q]\}$ where $n^{(t)}[q]=n^{(t-1)}[q],n^{(0)}[q]=n[q]$, then $n^*[q]$ To the end of the Q position, all the lengths of the suffixes that match the prefix (that is, where all prefixes are matched), and the $n[q]-1\in n^*[q-1]$, so that only the elements in the $n^*[q-1]$ are enumerated from large to small and the $n[q]$ can be obtained by judgement.
Proof of time complexity of KMP algorithm

When matching: $i, j$ grew $| s|$, while in $i=n[i-1]+1$, $i $ at least 1 less, that is, the statement executes at most $| s|$ times, so the complexity of time is $\theta (| s|) $

When constructing the prefix function $n$: We consider the change of K, we can get, in each $k=n[k]$, $k $ at least 1 reduction, and because $k$ with $i$ increased $| P|$ times, that is, the statement executes at most $| p|$ times, so the complexity of time is $\theta (| p|) $

So the total time complexity is $\theta (| s|+| p|) $
Optimization of KMP algorithm

We hope that by optimizing, in order to reduce the probability of mismatch, we propose the following improvements:

When constructing the $n ' $ array, when $p[k+1]=p[i]$, if $p[i+1]=p[k+2]$ then $n ' [i]=k+1$ otherwise $n ' [i]=n '] [k+1]$.
The correctness of the optimization proves

When matching, we found that if $p[q+1]$ and $s[s+q+1]$ mismatch, while $p[q+1]=p[n^{(t)}[q]+1]$, then $p[n^{(t)}[q]+1]$ must be mismatched with $s[s+q+1]$, so if $p[n[q]+1]=p[ q+1]$, the comparison must be mismatch, without consideration.

In this optimization, the recursive method of the function can be obtained, $N ' [q]=max\{n^*[q] and P[q+1]\neq p[n^{(t)}[q]+1]\}$, so $n ' [q]$ can still enumerate all possible matching prefixes, while reducing the mismatch probability.
The effect of the optimization on the space and time complexity of the algorithm

Since this optimization only changes the method of constructing n arrays, there is no effect on spatial complexity.

Proof of time complexity has no effect on the worst-case time complexity, as evidenced by KMP

Because the algorithm avoids the $p[n[q]+1]=p[q+1]$, it has better optimization effect (such as AAAAB,ABCABCABCD) for the pattern string with more repeating substrings.

The correctness proof of KMP algorithm and a small optimization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.