Is it a little unfair to put the work directly?
It doesn't matter anyway, you guys just have to look happy.
KMP algorithm
For the pattern string $p$, establish its prefix function $ n$, where $n [Q] $ represents the length of the longest suffix that can be matched to the prefix (also understood as the end position of the prefix) in $p$, at the end of the $q$ position, and in the match, if $p[i]$ and $s[j]$ mismatch, then $i=n [ I-1] +1$, otherwise $i=i+1,j=j+1$
Now consider how to construct a $n$, set up the current to calculate the $n[1..i-1]$, so that $k=n[i-1]$, if the $P [k+1]=p[i]$, then $n[i]=k+1$, otherwise $k=n[k]$. Repeat this process until you find the $n[i]$
This algorithm can be used in $\theta (| p|) $ in time constructs a prefix function $n$, in $\theta (| s|) $ time to complete the match, with a total time complexity of $\theta (| s|+| p|) $
The correctness proof of KMP algorithm
Prove the correctness of the matching process first:
In the process, if the $p[1..q]$ and the $s[s+1...s+q]$ match, and $p[q+1]$ and $s[s+q+1]$ mismatch, then the definition of $n$ can immediately derive $p[1..n[q]]$ and $S [s+q-n[q]+1...s+q]$ match, and $s [1...t]$ and $s[s+q-t+1...s+q]$ mismatch $ (n[q]<t<q) $, that is, you only need to verify the match between $p[n[q]+1]$ and $s[s+q+1]$, the correctness of the matching process can be verified.
Next, we prove the correctness of the prefix function $n$ calculation:
Make $n^*[q]= \{n[q],n^{(2)}[q],..., n^{(t)}[q]\}$ where $n^{(t)}[q]=n^{(t-1)}[q],n^{(0)}[q]=n[q]$, then $n^*[q]$ To the end of the Q position, all the lengths of the suffixes that match the prefix (that is, where all prefixes are matched), and the $n[q]-1\in n^*[q-1]$, so that only the elements in the $n^*[q-1]$ are enumerated from large to small and the $n[q]$ can be obtained by judgement.
Proof of time complexity of KMP algorithm
When matching: $i, j$ grew $| s|$, while in $i=n[i-1]+1$, $i $ at least 1 less, that is, the statement executes at most $| s|$ times, so the complexity of time is $\theta (| s|) $
When constructing the prefix function $n$: We consider the change of K, we can get, in each $k=n[k]$, $k $ at least 1 reduction, and because $k$ with $i$ increased $| P|$ times, that is, the statement executes at most $| p|$ times, so the complexity of time is $\theta (| p|) $
So the total time complexity is $\theta (| s|+| p|) $
Optimization of KMP algorithm
We hope that by optimizing, in order to reduce the probability of mismatch, we propose the following improvements:
When constructing the $n ' $ array, when $p[k+1]=p[i]$, if $p[i+1]=p[k+2]$ then $n ' [i]=k+1$ otherwise $n ' [i]=n '] [k+1]$.
The correctness of the optimization proves
When matching, we found that if $p[q+1]$ and $s[s+q+1]$ mismatch, while $p[q+1]=p[n^{(t)}[q]+1]$, then $p[n^{(t)}[q]+1]$ must be mismatched with $s[s+q+1]$, so if $p[n[q]+1]=p[ q+1]$, the comparison must be mismatch, without consideration.
In this optimization, the recursive method of the function can be obtained, $N ' [q]=max\{n^*[q] and P[q+1]\neq p[n^{(t)}[q]+1]\}$, so $n ' [q]$ can still enumerate all possible matching prefixes, while reducing the mismatch probability.
The effect of the optimization on the space and time complexity of the algorithm
Since this optimization only changes the method of constructing n arrays, there is no effect on spatial complexity.
Proof of time complexity has no effect on the worst-case time complexity, as evidenced by KMP
Because the algorithm avoids the $p[n[q]+1]=p[q+1]$, it has better optimization effect (such as AAAAB,ABCABCABCD) for the pattern string with more repeating substrings.
The correctness proof of KMP algorithm and a small optimization