What is the KMP algorithm? KMP algorithm Derivation

Source: Internet
Author: User

Spent about 3 days, understanding, understanding, reasoning KMP algorithm, here to do a summary! Hope to bring help to the people who see!!

1. What is the KMP algorithm?

In the method of finding pattern string pattern in the main string str, there is one way called KMP algorithm

The KMP algorithm is an algorithm that uses the maximum block symmetry of the pattern string character subset to make the pattern string move as far back as possible when the pattern string character matches the main string character mismatch.

There are 3 concepts: mismatch, a subset of pattern strings that have been matched, block symmetry

Mismatch and implicit information

In the process of comparing the character of the pattern string with the main string character, the character typeface is matched, and the characters are mismatched;

The implied message is that the mismatch is a match before it is mismatched.

In the main string s[0,100] Find pattern string p[0,6], from subscript 0 to find, in the position of subscript 5 mismatch, marked as p[0,5] mismatch, there is

P[5]!=s[5], and s[0,4]=p[0,4]

Then p[0,4] are all matched!

A subset of pattern strings that have been matched

In the next example, the pattern string is p[0,6], and p[0,4] are all matched, so a subset of the pattern strings that have been matched has

pcs={P[0,4],p[0,3],p[0,2],p[0,1],p[0]}

2. Block symmetry what is block symmetry?

Block symmetry, is the string prefix, suffix overlap;

Example: a b c d a B C

Prefix: All prefix subsets except for the last letter;

such as: A,ab,abc,abcd,abcda,abcdab

Suffix: Except for the first letter, all suffixes subset

such as: Bcdabc,cdabc,dabc,abc,bc,c

Here the prefix ABC and the suffix ABC coincide

You can think of this coincidence, which is symmetrical relative to the green block, so call it block symmetry.

There are many kinds of symmetry in blocks, such as:

Hey? Everyone is in a horizontal row, how does one fly up?

Fly up that will be explained in the section using the maximum symmetry .

What are the characteristics of the block?

Features: A string with a block symmetry has at least 2 pieces of symmetrical coincident parts ;

Analysis, symmetry is cosmetic, coincidence is the key. And the overlap is the prefix and suffix.

How to use block symmetry?

The pattern string, if the pattern string and the main string str match the process, in L This mismatch is p[0,7] mismatch, what will you do?

Analysis

First, the pattern string p[0,6] and the main string into the s[0,6] are exactly matched

The second, p[0,6] string is a block of symmetry!

Because p[0,6] just has the block symmetry, I can move the prefix ABC to the suffix ABC position, and then let D and the main string to match, so the use of fast symmetry, right?

In summary, you can see the maximum prefix of the mismatch character when P[7] mismatch, p[0,6] whether there is a block symmetry, if so, we can move the pattern string to the right, let the left coincident prefix move to the right coincident suffix, and then let the pattern string and the main string comparison!

Using maximum block symmetry? What do you mean?

What is the KMP algorithm subsection, said KMP is in the pattern string and the main string matching mismatch, the use of a subset of the pattern string has been matched to the maximum block symmetry, as far as possible to move the pattern string right! What does it mean to use the maximum block symmetry here?

the use of maximum block symmetry here means that recursion can occur !

Change the D of the last case to K, for example:

The KMP algorithm pre-computes the block symmetry of the prefixes in all the prefix subsets of the pattern string, using the block symmetry when the latter character of the prefix with block symmetry is mismatch;

For example, in this case p[0,6] has a block symmetry, then in p[0,7] is the L mismatch,

will use p[0,6] block symmetry, that is p[0,2] and p[4,6] Meet the character p[3] block symmetry,

If not, you will see the p[0,2] block symmetrical overlap of the part is not the symmetry of the block,

have, use, and so on, always recursive until there is no block symmetry.

Meaning of block symmetric length-programming

What is 3 in the first move? Block symmetrical coincident length, is also the next time to start comparing the location!

What is 1 in the second move? Block symmetrical coincident length, is also the next time to start comparing the location!

3.next Array Derivation-calculation of block symmetry

Individual block symmetry is meaningless, and block symmetry must be combined with mismatch to make use of block symmetry!

Therefore, you should calculate the block symmetry of all the prefix subset mismatch in pattern! Put it in a place called next[] array!

How is it calculated?

The next array is the symmetry of the blocks when calculating mismatches,

When the 1th character mismatch, there is no prefix suffix, so there is next[0] there is no block symmetry, recorded as Next[0]=-1;

When the 2nd character is mismatch, it has a subset of only 1 characters, there is no prefix suffix, no block symmetry, so it is recorded as next[1]=0;

Again, for value K, there are p0 P1, ..., pk-1 = Pj-k pj-k+1, ..., pj-1, then next[j] = k.

NEXT[J] = What does k mean?

Represented before PJ, there was a block symmetry of length k, with 2 coincident portions of length K.

To summarize, the prerequisites are as follows:

condition 1.next[0] is not present, next[1]=0;

condition 2. For subscript value K, there are already p0 P1, ..., pk-1 = Pj-k pj-k+1, ..., pj-1, then next[j] = k.

Next[] Array is initialized starting from 0, if we can deduce next[j+1] = What, is it possible to calculate the next[] array? Right

below to deduce next[j+1]

Known:

P0 P1, ..., pk-1 = Pj-k pj-k+1, ..., pj-1,== "next[j] = k

if PK matches PJ ,

There are P0 p1, ..., pk-1,pk = Pj-k pj-k+1, ..., pj-1pj,== "next[j+1] = k+1;

Originally there are 2 of the length of the symmetric coincident part of K, PK and PJ match, 2 of the length of K-symmetric coincident parts and 1 pairs of characters coincident, so there are next[j+1]=k+1;

Look at the picture, Next[j]=k, when PJ mismatch, next time with PK to match the main string, so next[j] the practical meaning is, when PJ mismatch, the next time you should use which character to match the main string!!

condition 3. next[] The value of the array is the next matching position when the secondary mismatch!

if PK and PJ do not match , next[j+1]=?

Next[j+1] The actual meaning is, p[0,j+1] pj+1 mismatch, P[0,j] The block symmetry coincident length, is also the next match should use the pattern string which character and the main string match, which character subscript is next[j+1].

the meaning of the symmetric length of the detailed reference block-programming

Which character to compare next?

Set a1=p0 p1,..., pk-1,a2=pj-k pj-k+1,..., pj-1;a1==a2

When PK and PJ do not match, can not be replaced with A1 A2, Green Fork;

Since a2 is the closest part to the main string, it is time to analyze whether the A2 has a block symmetry,

If the A2 has a block symmetry, then A1 also has a block symmetry, green box;

Therefore, this should be analyzed p[0,k] block symmetry, that is, Next[k].

Set X1 and x2 about green frame symmetry;

X3 and x4 about green frame symmetry;

Then move the X1 to the X4 position, is it possible to use the maximum;

So next[j+1]=next[k];

Summarize

If (p[k] = = P[j]) next[j+1]=k+1

else next[k+1]= Next[k]

4. References

7041827

Http://www.codeceo.com/kmp-next-array.html

https://www.zhihu.com/question/21474082 Next Array derivation

6729426 next array push to principle

https://www.xahkbg.com/

KMP, it is critical to compute the next[] array of the target query string t;

https://zhuanlan.zhihu.com/p/24274982

6729426

Http://www.cnblogs.com/c-cloud/p/3224788.html

https://zhuanlan.zhihu.com/p/24649304

What is the KMP algorithm? KMP algorithm Derivation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.