About KMP Algorithm

Source: Internet
Author: User

KMPAlgorithmThe main process is easier to understand, the correctness is easier to prove, and the proofs provided in the introduction to algorithms are also complete. However, for the correctness proof of the prefix function calculation process, the introduction to algorithms is too obscure, especially the extensive use of mathematical symbols, resulting in the difficulty of mathematical symbols exceeding the difficulty of proof. After thinking for a while, I want to give a natural language proof that prefix functions are easier to understand.

In fact, the prefix function of the kmp algorithm has a few lines. It should not have been expressed with too many languages and symbols. The proof method of the introduction to algorithms is somewhat confusing. For convenience, verifyRequiredSome symbols used for definition:

Pi [1... n] prefix array, which is required by the algorithm

P mode, the string to be matched

Prefix of K length of P (k) P

Prefix calculation function pseudo of KMP AlgorithmCodeAs follows:

Pi [1] =0ForI = 2To length (p) temp= PI [I-1]WhileTemp>0And P [temp]! = P [I-1] Temp=Pi [temp] EndWhilePi [I]= Temp +1EndFor

First, it is proved that in the known PI [1] ~ Pi [I-1], the algorithm can find PI [I] correctly.

Pi [I] is defined as the prefix and longest of P (I-1) in the suffix of P (I-1.

If all the suffixes of P (I-2) are known, you only need to determine if one of the suffixes of P (I-2) is also the next character of prefix 2 and whether it is equal to P [I-1]. if the condition is met and the suffix is the longest, add 1 to Pi [I].

Next we will prove that loop

 
While temp> 0 temp = PI [temp]

The traversal sequence is all the suffixes of P (I-2) that meet condition 1 are arranged in length from large to small. Because for P (I-2), the maximum suffix that meets condition 1 is P (PI [I-2]), P (PI [Pi [I-2]), as the suffix of P (PI [I-2]), is also P (I-2) the suffix (the suffix operator satisfies the pass-through), then all the sequences obtained through one traversal are the suffixes of P (I-2) that meet condition 1 and the length is arranged from large to small. Now let's assume that the suffixes pf1, pf2, and PF3 are traversed... PFN (here, in order to make it easy to understand the length in ascending order) does not include all, that is, a PF can be inserted into the middle of the PFI and PFI + 1 elements in the sequence, then PFI + 1's longest qualified suffix is not PFI but pf '. Obviously, length (PF') is greater than length (PFI), which is in conflict with the definition of PI function. So the suffixes pf1, pf2, PF3... PFN obtained by the loop are all suffixes.

So long as we traverse from the back to the front in this sequence (in fact, this is also done in pseudo code) determine that the next character with the prefix is equal to P [I-1] (this is done by the second condition of the while judgment statement in pseudo code ), that is to say, a loop of an algorithm that can be proved is enough to obtain the longest suffix that satisfies the prefix of P (I-1) at the same time.

Because the initial condition is pi [I] = 0 and the condition is met, the entire algorithm is correct.

 

References:

[1] Introduction to algorithms 32.4

[2] Jeffrey J. McConnell, analysis of algorithm-an active learning approach, 5.1.2

In addition, regarding the efficiency comparison between KMP and general violence law, [2] states that KMP is actually only a little better than the violence law, while [1] also shows the expectation of the complexity of the enforcement of the violence law, which is linear. It seems that the law of violence is tolerable.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.