KMP algorithm Summary

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Document directory

Rules for pre-Calculation of the P array:

I. Introduction to KMP Algorithms

KMP (algorithms designed by knuth, Morris, and Pratt );

The KMP algorithm is mainly used for pattern matching. Simply put, it is string matching, for example, a = "ABC", B = "B". Q: Is B a substring of, in this case, the KMP algorithm is used because the efficiency of common algorithms is too low;

While KMP can achieve the linear time of O (m + n;

Ii. normal pattern matching algorithm

The pattern matching algorithm is simply to specify two strings A and B to check whether B is a substring of A. in Java, the indexof () of the string class implements this function;

Algorithm idea of normal pattern matching:

For example, there are two strings A: "ababc", B: "ABC ";

Step 1: Initialize I = 1, j = 1; (a [0] and B [0] can be empty or store the string length );

Step 2: traverse a and B cyclically. If a [I] = B [J], then I ++, J ++;

Step 3: When the loop jumps out, determine whether B is a substring of;

Steps:

1. I = 1, j = 1;

2. Because a [I] = B [J], I ++, J ++;

3. Because a [I] = B [J], I ++, J ++;

4. Because a [I]! = B [J], so J is restored to 1, and I is restored to I-j + 2;

5. Because a [I]! = B [J], so J is restored to 1, and I is restored to I-j + 2;

6. Because a [I] = B [J], I ++, J ++;

7. Because a [I] = B [J], I ++, J ++;

8. Because a [I] = B [J], I ++, J ++;

9. Because I and j are out of bounds at the same time, it indicates that B is the substring of;

From the above steps, we can see that:

Common pattern matching algorithms are inefficient, and many steps are redundant. For example, the comparison between Step 1: A [2] and B [1] is as follows, we have already compared: A [2] = B [2], B [1]! = B [2], so a [2]! = B [1], so this step is redundant.

The algorithm is as follows:

Private Static int indexof (string a, string B) {// 1. define two pointers to A and Bint I = 0; Int J = 0; // 2. traverse while (I <. length () & J <B. length () {if (. charat (I) = B. charat (j) {I ++; j ++;} else {I = I-(J-1); j = 0 ;}/ ** match * 1. matches a's intermediate string I <. length () * 2. match the last string of A (I =. length () & J = B. length () **/If (I =. length () & J = B. length () | I <. length () {return i-b.length () ;}else {return-1 ;}}

Iii. KMP Algorithm

The main idea of the KMP algorithm is: I don't need to go back, but it only needs to increase progressively;

Rules of shuoj:

(1) extract the matched string and find a substring so that the substring is the longest prefix and longest Suffix of the matched string. For example:

The matched string is "Ababa". It can be seen that "A" and "ABA" are both prefix and suffix, but "ABA" is long, so J' = "ABA ". length ();

Expressed in mathematical language:

When a [I-j + 1... I] = B [1... J] And a [I + 1]! = B [J + 1], you need to adjust J to re-A [I-j + 1... I] = B [1... J];

In this case, we use an array for pre-calculation and record it as P [], p [J] to indicate that when J strings have been matched, but the J + 1 character does not match the new value of J after the return;

For example, in the preceding example, P [5] = 3; because the length of "Ababa" is 5, j = 3 after the return;

For example:

A [1... 5] = B [1... 5], a [6]! = B [6], I = 6, j = 5, so we need to return to J (based on the pre-calculated P [] array, return to J = P [J]);

Principle of rollback: the common part is "Ababa". From this string, we can see that "ABA" is the longest prefix and suffix, so we can perform such a transformation, make J = 3;

A [3... 5] = B [1... 3], but a [6]! = B [4], I = 6, j = 3, so continue to repeat the data so that j = 1, as shown in

A [5] = B [1], but a [6]! = B [2], I = 6, j = 1, so continue to repeat, as shown in, making J = 0:

Because j = 0, you cannot continue to roll back. I index the end of string a, but J is still not the end of string B. Therefore, B is not a sub-string of string;

The KMP algorithm is as follows:

/*** O (m + n) horizontal analysis ** @ Param A indicates the text string * @ Param B indicates the mode string * @ return */public static int kmp_indexof (string, string B) {int n =. length (); int M = B. length (); // the reason for changing to a character array is that we need to record data from Index 1. For example, if a = "ABA", CHA = {'', 'A ', 'B', 'A'}; char Cha [] = ("" + ). tochararray (); char CHB [] = ("" + B ). tochararray (); Int J = 0; // pointer to B INT [] P = computeparray (CHB ); // pre-calculate the P array for (INT I = 1; I <= N; I ++) based on string B) {While (j> 0 & CHB [J + 1]! = CHA [I]) {// The J value can be reduced at most m times. By returning m times to N for loops J = P [J]; //} If (CHB [J + 1] = CHA [I]) {J ++;} If (j = m) {// J has matched the end, so all matches return I-m;} return-1;} Private Static int [] computeparray (char [] CHB) {int [] P = new int [CHB. length + 1]; P [1] = 0; Int J = 0; For (INT I = 2; I <CHB. length; I ++) {While (j> 0 & CHB [J + 1]! = CHB [I]) {J = P [J];} If (CHB [J + 1] = CHB [I]) {J ++ ;} P [I] = J;} return P ;}

Rules for pre-Calculation of the P array:

For example, B = "ababac ",

1. initialize P [] and Set P [1] to 0, I = 2, j = 0;

2. Because B [2]! = B [1], so P [2] = 0, I = 3, j = 0; P [2] = 0;

3. Because B [3] = B [1], J ++, that is, j = 1, I = 4; P [3] = 1;

4. Because B [4] = B [2], J ++, that is, j = 2, I = 5; P [4] = 2;

5. Because B [5] = B [3], J ++, that is, j = 3, I = 6, P [5] = 3;

6. Because j> 0, and B [6]! = B [4], so J = P [J] = 2;

7. Because j> 0, and B [6]! = B [2], so J = P [J] = 0;

8. Because j = 0, and B [6]! = B [2], so P [6] = 0;

References:

Http://www.matrix67.com/blog/archives/115/ this article is written very well;

The implementation method in this article is also good:

public static void computePArray(String T,int p[]){T = " "+T;int j=0;p[1] = 0;for(int i=2;i<p.length;i++){while(j>0&&T.charAt(j+1)!=T.charAt(i)){j = p[j];}if(T.charAt(j+1)==T.charAt(i)){j++;}p[i] = j;}}

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

KMP algorithm Summary

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support