Detailed and detailed KMP in the next and Nextval algorithm __ algorithm

Source: Internet
Author: User
Tags bitwise
I. Definition

KMP algorithm is an improved string matching algorithm, which is discovered by both D.e.knuth,j.h.morris and V.r.pratt, so it is called Knut-Morris-Pratt operation (KMP algorithm). Second, the schematic principle

Some of the following borrowing http://www.cnblogs.com/c-cloud/p/3224788.html
This algorithm is not easy to understand, there are many explanations on the Internet, but it is difficult to read. It was not until I read this article that I really understood the algorithm. Below, I use my own language, try to write a comparison understood KMP algorithm explanation. 1.

First, the first character of the string "BBC Abcdab Abcdabcdabde" is compared to the first character of the search term "abcdabd". Because B does not match A, the search term is moved one bit after the other. 2.

Because B does not match A, the search term moves backwards. 3.

In this way, until the string has a character, the same as the first character of the search term. 4.

Then compare the string and the next character of the search term, or the same. 5.

Until the string has a character that is not the same as the corresponding character in the search term.
When the empty and D do not match, you actually know that the first six characters are "Abcdab". The idea of the KMP algorithm is to try to use this known information, not to move the "search position" back to where it has been compared, and continue to move it backwards, thus increasing efficiency. 6.

Because the space does not match C, the search term continues to move backwards. 7.

Bitwise comparison until the C and D are found to be mismatched. Then, continue to move the search word backwards. 8.

A bitwise comparison, until the last of the search words, finds an exact match, and the search completes. Three, detailed

marking (j) 1 2 3 4 5 6 7
Pattern string (P) A B C D A B D
1, the core idea of KMP algorithm

- -that is, back to where there is "symmetry." Note that the "symmetry" here does not refer to the ABCCBA, but refers to the "Abcabc" Squadron, which has 1 "ABC" at the end, and the "Abcccccab" Squadron has a "AB" at the end of the team. 2, look at the picture to speak

From the 2nd part of the diagram, it can be concluded that when a scan to a pattern string does not match, it always goes back to the part of the pattern string before this bit where there is duplication . For example, 2.5 cannot find the letter D (label 4), D before the string "Abcdab", the existence of the team before the tail repeat "AB" (2 characters), so back to the team head "AB" after a C (label 3). For example, the letter C (label 3) is not found in 2.6, the string "AB" before C, there is no duplicate (0 characters), so it is returned to the front of the team Head (label 1).

So, Next (j) is the label of the letter that will be returned when the J-bit of the pattern string does not match. As you can see from the above example, if you have 2 characters to repeat, you are back to 3rd, and if you have 0 characters to repeat, you are back to 1th place. Obviously it's a +1 relationship. 3. Next calculation process

whatever the first and second digits are, the first digit next (1) = 0, the second next (2) = 1, which is fixed.

When J=3, P (j) =c,c preceded by "AB", there is no duplication (0 digits), so next (3) =0+1=1;
When J=4, P (j) =d,c preceded by "ABC", there is no duplication (0 digits), so next (4) =0+1=1;
When J=5, P (j) =a,c preceded by "ABCD", there is no repetition (0 bit), so next (5) =0+1=1;
When J=6, P (j) =b,c preceded by "ABCDA", there is a repeat "A" (1 bit), so next (6) =1+1=2;
When J=7, P (j) =d,c preceded by "Abcdab", there is repeated "AB" (2-bit), so next (7) =2+1=3; 4, nextval= to next optimization

Looking at the 5th bit "A", when it does not match, the next line goes back to the label 1 and the letter A, then it is futile to match a, because a mismatch is known, so it continues to fall back to next (1) = 0 of the label 1 letter A. To optimize directly, there is a nextval line: Just look at the number of duplicate letters in front of you.

J=5,p (5) =a,next (5) =1,p (1) =a=p (5), so Nextval (5) =next (1) = 0;
J=6,p (6) =b,next (6) =2,p (2) =b=p (6), so Nextval (6) =next (2) = 1;
J=7,p (7) =d,next (7) =3,p (3) =c!=p (7), so Nextval (7) =next (7) = 3; Results:

marking (j) 1 2 3 4 5 6 7
Pattern string (P) A B C D A B D
Next 0 1 1 1 1 2 3
Nextval 0 1 1 1 0 1 3

Detailed completion, I hope you like.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.