KMP algorithm Summary

Source: Internet
Author: User
Document directory
  • Rules for pre-Calculation of the P array:

 

I. Introduction to KMP Algorithms

 

KMP (algorithms designed by knuth, Morris, and Pratt );

The KMP algorithm is mainly used for pattern matching. Simply put, it is string matching, for example, a = "ABC", B = "B". Q: Is B a substring of, in this case, the KMP algorithm is used because the efficiency of common algorithms is too low;

While KMP can achieve the linear time of O (m + n;

 

Ii. normal pattern matching algorithm

 

The pattern matching algorithm is simply to specify two strings A and B to check whether B is a substring of A. in Java, the indexof () of the string class implements this function;

 

Algorithm idea of normal pattern matching:

For example, there are two strings A: "ababc", B: "ABC ";

Step 1: Initialize I = 1, j = 1; (a [0] and B [0] can be empty or store the string length );

Step 2: traverse a and B cyclically. If a [I] = B [J], then I ++, J ++;

Step 3: When the loop jumps out, determine whether B is a substring of;

Steps:

1. I = 1, j = 1;

 

2. Because a [I] = B [J], I ++, J ++;

3. Because a [I] = B [J], I ++, J ++;

4. Because a [I]! = B [J], so J is restored to 1, and I is restored to I-j + 2;

5. Because a [I]! = B [J], so J is restored to 1, and I is restored to I-j + 2;

6. Because a [I] = B [J], I ++, J ++;

7. Because a [I] = B [J], I ++, J ++;

8. Because a [I] = B [J], I ++, J ++;

9. Because I and j are out of bounds at the same time, it indicates that B is the substring of;

 

From the above steps, we can see that:

Common pattern matching algorithms are inefficient, and many steps are redundant. For example, the comparison between Step 1: A [2] and B [1] is as follows, we have already compared: A [2] = B [2], B [1]! = B [2], so a [2]! = B [1], so this step is redundant.

The algorithm is as follows:

 

Private Static int indexof (string a, string B) {// 1. define two pointers to A and Bint I = 0; Int J = 0; // 2. traverse while (I <. length () & J <B. length () {if (. charat (I) = B. charat (j) {I ++; j ++;} else {I = I-(J-1); j = 0 ;}/ ** match * 1. matches a's intermediate string I <. length () * 2. match the last string of A (I =. length () & J = B. length () **/If (I =. length () & J = B. length () | I <. length () {return i-b.length () ;}else {return-1 ;}}

Iii. KMP Algorithm

 

The main idea of the KMP algorithm is: I don't need to go back, but it only needs to increase progressively;

 

Rules of shuoj:

(1) extract the matched string and find a substring so that the substring is the longest prefix and longest Suffix of the matched string. For example:

The matched string is "Ababa". It can be seen that "A" and "ABA" are both prefix and suffix, but "ABA" is long, so J' = "ABA ". length ();

Expressed in mathematical language:

When a [I-j + 1... I] = B [1... J] And a [I + 1]! = B [J + 1], you need to adjust J to re-A [I-j + 1... I] = B [1... J];

 

In this case, we use an array for pre-calculation and record it as P [], p [J] to indicate that when J strings have been matched, but the J + 1 character does not match the new value of J after the return;

For example, in the preceding example, P [5] = 3; because the length of "Ababa" is 5, j = 3 after the return;

 

For example:

A [1... 5] = B [1... 5], a [6]! = B [6], I = 6, j = 5, so we need to return to J (based on the pre-calculated P [] array, return to J = P [J]);

Principle of rollback: the common part is "Ababa". From this string, we can see that "ABA" is the longest prefix and suffix, so we can perform such a transformation, make J = 3;

 

A [3... 5] = B [1... 3], but a [6]! = B [4], I = 6, j = 3, so continue to repeat the data so that j = 1, as shown in

 

 

A [5] = B [1], but a [6]! = B [2], I = 6, j = 1, so continue to repeat, as shown in, making J = 0:

 

 

Because j = 0, you cannot continue to roll back. I index the end of string a, but J is still not the end of string B. Therefore, B is not a sub-string of string;

 

 

 

The KMP algorithm is as follows:

/*** O (m + n) horizontal analysis ** @ Param A indicates the text string * @ Param B indicates the mode string * @ return */public static int kmp_indexof (string, string B) {int n =. length (); int M = B. length (); // the reason for changing to a character array is that we need to record data from Index 1. For example, if a = "ABA", CHA = {'', 'A ', 'B', 'A'}; char Cha [] = ("" + ). tochararray (); char CHB [] = ("" + B ). tochararray (); Int J = 0; // pointer to B INT [] P = computeparray (CHB ); // pre-calculate the P array for (INT I = 1; I <= N; I ++) based on string B) {While (j> 0 & CHB [J + 1]! = CHA [I]) {// The J value can be reduced at most m times. By returning m times to N for loops J = P [J]; //} If (CHB [J + 1] = CHA [I]) {J ++;} If (j = m) {// J has matched the end, so all matches return I-m;} return-1;} Private Static int [] computeparray (char [] CHB) {int [] P = new int [CHB. length + 1]; P [1] = 0; Int J = 0; For (INT I = 2; I <CHB. length; I ++) {While (j> 0 & CHB [J + 1]! = CHB [I]) {J = P [J];} If (CHB [J + 1] = CHB [I]) {J ++ ;} P [I] = J;} return P ;}

 

Rules for pre-Calculation of the P array:

 

For example, B = "ababac ",

1. initialize P [] and Set P [1] to 0, I = 2, j = 0;

2. Because B [2]! = B [1], so P [2] = 0, I = 3, j = 0; P [2] = 0;

3. Because B [3] = B [1], J ++, that is, j = 1, I = 4; P [3] = 1;

4. Because B [4] = B [2], J ++, that is, j = 2, I = 5; P [4] = 2;

5. Because B [5] = B [3], J ++, that is, j = 3, I = 6, P [5] = 3;

6. Because j> 0, and B [6]! = B [4], so J = P [J] = 2;

7. Because j> 0, and B [6]! = B [2], so J = P [J] = 0;

8. Because j = 0, and B [6]! = B [2], so P [6] = 0;

 

 

References:

Http://www.matrix67.com/blog/archives/115/ this article is written very well;

 

The implementation method in this article is also good:

public static void computePArray(String T,int p[]){T = " "+T;int j=0;p[1] = 0;for(int i=2;i<p.length;i++){while(j>0&&T.charAt(j+1)!=T.charAt(i)){j = p[j];}if(T.charAt(j+1)==T.charAt(i)){j++;}p[i] = j;}}

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.